<a href="https://colab.research.google.com/github/faisal-fida/100-Python-Projects-in-Google-Colab/blob/main/Basics_to_Intermediate_Stats_Methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Principal Component Analysis (PCA)
PCA is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving the most important information.

## Support Vector Machines (SVM)
SVM is a supervised learning algorithm used for classification and regression tasks. It finds a hyperplane that best separates different classes in the data.

## K-Means Clustering
K-Means is an unsupervised clustering algorithm that partitions data into 'k' clusters based on similarity, with each cluster represented by its mean.

## Hierarchical Clustering
Hierarchical Clustering is another unsupervised clustering technique that creates a tree-like structure of data points, allowing for different levels of granularity in clustering.

## Naive Bayes
Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem. It assumes independence between features, hence the "naive" label.

## Decision Trees
Decision Trees are tree-like structures used for both classification and regression tasks. They split data based on features to make predictions.

## Random Forest
Random Forest is an ensemble learning method that creates multiple decision trees and combines their outputs for more accurate predictions.

## Gradient Boosting
Gradient Boosting is another ensemble technique that combines weak learners (usually decision trees) to build a strong predictive model.

## Markov Models
Markov Models are used to represent sequential data where the future state depends only on the current state, making them useful for time-series analysis.

## Hidden Markov Models (HMM)
HMM is an extension of Markov Models where the underlying system is assumed to be unobservable. They are widely used in speech recognition and natural language processing.

## Gibbs Sampling
Gibbs Sampling is a Markov Chain Monte Carlo (MCMC) technique used for sampling from complex probability distributions, often used in Bayesian statistics.

## Long Short-Term Memory (LSTM)
LSTM is a type of recurrent neural network (RNN) designed to capture long-term dependencies in sequential data, commonly used in natural language processing and time-series analysis.

## Gated Recurrent Units (GRU)
GRU is another type of RNN that simplifies the LSTM architecture while still capturing temporal dependencies in sequential data.

## Autoencoders
Autoencoders are neural network architectures used for unsupervised learning and dimensionality reduction by reconstructing input data from a compressed representation.

## Variational Autoencoders (VAE)
VAE is an extension of autoencoders that incorporates probabilistic modeling, allowing for generative capabilities and learning underlying data distributions.

## t-SNE (t-Distributed Stochastic Neighbor Embedding)
t-SNE is a dimensionality reduction technique used for visualizing high-dimensional data in a lower-dimensional space while preserving pairwise similarities.

## UMAP (Uniform Manifold Approximation and Projection)
UMAP is another dimensionality reduction technique known for its ability to preserve both local and global structures of the data.

## Survival Analysis
Survival Analysis is used to analyze time-to-event data, often employed in medical research to model patient survival rates.

## Anomaly Detection
Anomaly Detection involves identifying rare events or outliers in data, often using statistical methods like clustering, density estimation, or machine learning models.

## Bootstrapping
Bootstrapping is a resampling technique used to estimate the sampling distribution of a statistic by repeatedly resampling from the observed data.

## Cross-Validation
Cross-Validation is a technique used to assess the performance of a model by dividing the data into subsets for training and testing, helping to mitigate overfitting.

## Bayesian Inference
Bayesian Inference is a probabilistic approach to making statistical inferences by updating beliefs as new data becomes available.

## Hypothesis Testing
Hypothesis Testing is a method for making inferences about population parameters based on sample data, helping to decide if an observed effect is statistically significant.

## Regularization
Regularization techniques like L1 and L2 regularization help prevent overfitting in machine learning models by adding penalties to the model's parameters.


In [None]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.cluster import KMeans
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from hmmlearn import hmm

In [None]:
from sklearn.manifold import TSNE
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from tensorflow.keras.layers import LSTM, GRU, Input, Dense
from tensorflow.keras.models import Model
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn import datasets

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import cross_val_score
from scipy import stats

In [None]:
iris = load_iris()
X, y = iris.data, iris.target

# Principal Component Analysis (PCA)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Support Vector Machines (SVM)
svm = SVC()
svm.fit(X, y)

# K-Means Clustering
kmeans = KMeans(n_clusters=3)
kmeans_labels = kmeans.fit_predict(X)

# Naive Bayes
nb = GaussianNB()
nb.fit(X, y)

# Decision Trees
dt = DecisionTreeClassifier()
dt.fit(X, y)

# Random Forest
rf = RandomForestClassifier()
rf.fit(X, y)

# Gradient Boosting
gb = GradientBoostingClassifier()
gb.fit(X, y)

# t-SNE
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

# Long Short-Term Memory (LSTM)
input_layer = Input(shape=(sequence_length, num_features))
lstm_layer = LSTM(64)(input_layer)
output_layer = Dense(num_classes, activation='softmax')(lstm_layer)
lstm_model = Model(inputs=input_layer, outputs=output_layer)
lstm_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Anomaly Detection using Isolation Forest
iso_forest = IsolationForest(contamination=0.1)
iso_forest.fit(X)
anomaly_scores = iso_forest.decision_function(X)

# Cross-Validation
svm = SVC()
scores = cross_val_score(svm, X, y, cv=5)  # 5-fold cross-validation

# Hypothesis Testing (t-test)
t_statistic, p_value = stats.ttest_ind(sample_A, sample_B)

# Regularization (L2 Regularization with Logistic Regression)
X, y = datasets.load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
logreg = LogisticRegression(penalty='l2')
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)