### Advanced machine learning
#### Decision Trees: 
Decision Trees are a flowchart-like type of Supervised Machine Learning where the data is continuously split according to a certain parameter. They are easy to understand and interpret, which is one of their biggest advantages.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.tree import DecisionTreeClassifier

# Initialize the model
clf = DecisionTreeClassifier()

# Fit the model to the training data
clf.fit(X_train, y_train)

# Predict on the test data
predictions = clf.predict(X_test)

#### Random Forest: 
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple algorithms to solve a particular problem.

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the model
forest = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model
forest.fit(X_train, y_train)

# Make predictions
forest_predictions = forest.predict(X_test)

#### Naive Bayes: 
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

In [None]:
from sklearn.naive_bayes import GaussianNB

# Initialize the model
nb = GaussianNB()

# Fit the model
nb.fit(X_train, y_train)

# Make predictions
nb_predictions = nb.predict(X_test)

##### K-Nearest Neighbors (KNN): 
K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Initialize the model
knn = KNeighborsClassifier(n_neighbors=5)

# Fit the model
knn.fit(X_train, y_train)

# Make predictions
knn_predictions = knn.predict(X_test)

##### Support Vector Machines (SVM): 
SVM is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space with the value of each feature being the value of a particular coordinate.

In [None]:
from sklearn import svm

# Initialize the model
svc = svm.SVC(kernel='linear')

# Fit the model
svc.fit(X_train, y_train)

# Make predictions
svc_predictions = svc.predict(X_test)

#### Ensemble methods
Ensemble methods are techniques that combine predictions from multiple machine learning algorithms to deliver more accurate predictions than a single model.
##### Bagging
Bagging, or Bootstrap Aggregating, involves taking multiple subsets of your original dataset, building a separate model for each subset, and then combining the output of all these models. For instance, the Random Forest algorithm is a type of bagging method.

In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier

bagging = BaggingClassifier(KNeighborsClassifier(), max_samples=0.5, max_features=0.5)
bagging.fit(X_train, y_train)
predictions = bagging.predict(X_test)

##### Boosting
Boosting works by training a model, identifying the mistakes it made, and then building a new model that focuses on the mistakes of the first model. This process is repeated, each time focusing on the mistakes of the last model, until a combined model with a low error rate is obtained. An example is the AdaBoost algorithm.

In [None]:
from sklearn.ensemble import AdaBoostClassifier

adaboost = AdaBoostClassifier(n_estimators=100)
adaboost.fit(X_train, y_train)
predictions = adaboost.predict(X_test)

##### Stacking
Stacking involves training multiple different models and then using another machine learning model to combine their outputs.

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC

estimators = [('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
              ('svr', SVC(random_state=42))]

stacking = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stacking.fit(X_train, y_train)
predictions = stacking.predict(X_test)

#### Overfitting and Underfitting:

Overfitting and underfitting refer to the phenomena when a machine learning model performs well on the training data but poorly on the test data (overfitting), or when it performs poorly on both the training data and the test data (underfitting).

* Overfitting occurs when the model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. It means the model has learned the data "too well".
* Underfitting occurs when a machine learning model cannot capture the underlying pattern of the data. These models usually have poor predictive performance.

In essence, underfitting is a model with high bias (it makes strong assumptions and oversimplifies the problem), and overfitting is a model with high variance (it models the random noise in the training data, not the intended outputs).

Let's demonstrate underfitting and overfitting using the decision tree algorithm. We will use the depth of the tree as our tuning parameter.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Create a moon-shaped, noisy dataset
X, y = make_moons(n_samples=300, noise=0.25, random_state=42)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Let's set the tree depth to 1 (A stump)
stump = DecisionTreeClassifier(max_depth=1, random_state=42)
stump.fit(X_train, y_train)

# Now let's set the tree depth to 6
tree = DecisionTreeClassifier(max_depth=6, random_state=42)
tree.fit(X_train, y_train)

# And finally, let's not limit the tree depth
deep_tree = DecisionTreeClassifier(max_depth=None, random_state=42)
deep_tree.fit(X_train, y_train)

# Now let's test the accuracy of each model on the training data and test data
models = [stump, tree, deep_tree]
for model in models:
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    print(f'Model with max_depth={model.max_depth}:')
    print(f'Training accuracy: {accuracy_score(y_train, y_train_pred)}')
    print(f'Test accuracy: {accuracy_score(y_test, y_test_pred)}\n')

In the output, you might observe that the model with a depth of 1 (the stump) performs poorly on both the training and test sets. This is a classic case of underfitting.

On the other hand, the model with no limit on the tree depth might perform very well on the training set but poorly on the test set. This is a classic case of overfitting.

The model with a depth of 6 might give the best results, as it might strike a balance and perform well on both the training set and the test set.

One way to fine-tune the trade-off between underfitting and overfitting is to use cross-validation to find the optimal model complexity. Moreover, many models have regularization parameters that can be adjusted to avoid overfitting.

### Unsupervised learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Another common technique is dimensionality reduction, which attempts to reduce the number of features in a dataset while preserving as much statistical information as possible.

#### Clustering
##### DBSCAN
Density-Based Spatial Clustering of Applications with Noise, or DBSCAN, is a density-based clustering algorithm, which has the concept of core samples of high density and expands clusters from them.

In [None]:
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN

X, y = make_moons(n_samples=200, noise=0.05, random_state=0)

dbscan = DBSCAN(eps=0.3, min_samples=5)
dbscan.fit(X)

# Plot the cluster assignments
plt.scatter(X[:, 0], X[:, 1], c=dbscan.labels_)
plt.show()

##### LDA (Latent Dirichlet Allocation): 
This is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. It's most commonly used for natural language processing tasks.

In [None]:
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

# Sample data
data = ['This is the first document.','This document is the second document.','And this is the third one.','Is this the first document?']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(data)

lda = LatentDirichletAllocation(n_components=2, random_state=0)
lda.fit(X)

# Displaying topics
for idx, topic in enumerate(lda.components_):
    print("Topic %d:" % (idx))
    print([(vectorizer.get_feature_names()[i], topic[i])
                    for i in topic.argsort()[:-10 - 1:-1]])

#### Dimensionality Reduction

##### PCA (Principal Component Analysis): 
This is a technique used for feature extraction. It combines our input variables in a specific way, and we can drop the “least important” variables while still retaining the most valuable parts of all of the variables.

In [None]:
from sklearn.decomposition import PCA

# Assume X is your matrix with shape (n_samples, n_features)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.show()

##### t-SNE (t-Distributed Stochastic Neighbor Embedding): 
This is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.

In [None]:
from sklearn.manifold import TSNE

# Again, assume X is your matrix with shape (n_samples, n_features)
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y)
plt.show()

For both PCA and t-SNE examples, the input X should be a matrix with shape (n_samples, n_features). y is only used for coloring the points in the plot, and represents the true labels of the samples.

### Neural Networks and Deep Learning
Neural networks are a set of algorithms modeled after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

In more concrete terms, deep learning is the name for multilayered neural networks, which are networks composed of several "layers" of nodes — connected in a "deep" structure.

Here's an example of a simple feedforward neural network built with PyTorch:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Set up a simple feedforward neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(20, 50)  # 20 input units to 50 hidden units
        self.fc2 = nn.Linear(50, 1)  # 50 hidden units to 1 output unit

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # Activation function for hidden layer
        x = self.fc2(x)  # No activation function for output layer
        return x

# Assume we have some data in X and targets in Y
X = torch.randn(100, 20)
Y = torch.randn(100, 1)

# Instantiate the network, loss function and optimizer
net = Net()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Train the network
for epoch in range(100):  # loop over the dataset multiple times
    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(X)
    loss = criterion(outputs, Y)
    loss.backward()
    optimizer.step()

print('Finished Training')

In this code, we define a simple network with one hidden layer and one output layer. The hidden layer uses the ReLU activation function. We're training this network to minimize the mean squared error between its outputs and the target values Y. We're using stochastic gradient descent (SGD) as our optimization algorithm.

This is a simple example, but deep learning can get much more complex! In real-world scenarios, we often use much larger networks and train them on big datasets. This requires more computational resources (especially GPUs), and more sophisticated techniques for managing data and training dynamics.

Additionally, you would typically want to separate your data into a training set and a validation set, so that you can monitor your network's performance on unseen data as it trains, and prevent overfitting.

Finally, deep learning also includes other types of architectures, such as convolutional neural networks (CNNs) for image tasks, recurrent neural networks (RNNs) for sequential data, transformers for natural language processing, autoencoders for unsupervised learning, and many more. Deep learning is my bread and butter, so please check out other corners of my GitHub to learn more about these fascinating structures!