# Chapter 1

- Unsupervised learning : Machine learning on unlabeled data
- pure pattern discovery with unguided learning
- dimension = number of features or columns in dataset
- Algorithms :
    - clustering: Organize data into similar groups
        - k-means : 
            - Number of groups formed by samples (rows in dataset)
            - calculates the mean of each cluster as centroid
            - assign nearest centroid to each sample
            - Inertia : how far the samples are from centroids. good clustering has low inertia
        - Scatterplot visualization
        - t-SNE : 
            - maps data from high dimension to 2 dimensional space for visualization
            - A black box : Does not provide valid interpretation. Only gives insight about cluster numbers.
            - Produces different result on different runs
        - Hierarchical clustering : 
            - split into tree of subgroups
            - All leaf groups are in different cluster
            - the closest 2 clusters merge in each step based on distance (linkage condition)
                - for n samples, n-1 steps are taken to complete the whole merge process
                - complete linkage = when distance between clusters is maximum distance between samples
                - single linkage = when distance between clusters is minimum distance between samples
            - At final step, there is only one cluster ("agglomerative")
            - The reverse way of doing the same thing by splitting : Divisive clustering
    - Dimension reduction : Reduce redundant features in data in order to produce simpler model
        - Identifies less informative features as noisy features
        - Pattern Information achieved in a compressed form
        - PCA:
            - Step 1 decorrelation : 
                - principal components = the direction or axis where the sample varies the most
                - rotates data so that data aligns with axis with respect to principal components
                - the mean becomes 0 
                - no data loss
                - due to rotation, any correlated features become decorrelated
            - Step 2  dimension reduction :
                - Intrinsic dimension = number of features needed to approximate the dataset
        - TruncatedSVD : PCA alternative that works on sparse dataset where most entries are 0. (eg : "tf-idf")
- Market basket analysis: Find items that are frequently bought together (eg: pencil and eraser) 
- Anomaly detection : When data appears outside of normal range (outlier). eg: Suspicious Credit Card Transactions
- A list here: https://mlforall.files.wordpress.com/2019/09/machinelearningalgorithms.png
- and here: https://www.theinsaneapp.com/wp-content/uploads/2022/02/Machine-Learning-Algorithms-PDF.png

### K-Means

```
# Scaling, normalization or standardization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
samples = df.drop("target",axis=1)
scaler.fit(samples)
StandardScaler(copy=True, with_mean=True, with_std=True)
samples_scaled = scaler.transform(samples)

from sklearn.cluster import KMeans
model = KMeans(init="random", n_clusters=3, n_init=5, max_iter=100)
model.fit(samples)
labels = model.predict(samples)
#### labels = model.fit_predict(samples) ### Alternative
# See Distribution of samples in a cluster (how well the model performed)
ct = pd.crosstab(df['predicted_labels'], df['target'])
print(ct)
# Cluster quality, Inertia : the spread of clusters
print(model.inertia_)
# Locations of the centroids (You can plot with scatterplot)
centroids = model.cluster_centers_
# Iteration of convergence
print(model.n_iter_)
# Visualize for best cluster : Choose elbow as the optimized k
plt.plot(k_values, inertia_values, marker='o')
plt.show()

### k-means withPipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.pipeline import make_pipeline
scaler = StandardScaler()
kmeans = KMeans(n_clusters=3)
pipeline = make_pipeline(scaler, kmeans)
pipeline.fit(samples)
labels = pipeline.predict(samples)
```

# Chapter 2

### Hierarchical clustering

```
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
sample = df.drop("target", axis=1)
mergings = linkage(sample.values, method='complete')
# Visualize
dendrogram(mergings, labels=df["target].values, leaf_rotation=90, leaf_font_size=6)
plt.show()
from scipy.cluster.hierarchy import fcluster
# Take only specified portion of cluster based on distance
labels = fcluster(mergings, 15, criterion='distance')
print(labels)
df['predicted_labels'] = labels
# See distribution of samples 
ct = pd.crosstab(df['predicted_labels'], df['target'])
print(ct)

```

### t-SNE

```
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
samples = df.drop("target", axis = 1)
# Tune learning_rate between 50 to 200
model = TSNE(learning_rate=100)
# NOTE : There is no fit or transform stand-alone method. They are done together
transformed = model.fit_transform(samples.values)
# Visualize the clusters
xs = transformed[:,0]
ys = transformed[:,1]
plt.scatter(xs, ys, c=target_array)
plt.show()
```

# Chapter 3

### PCA

```
from sklearn.decomposition import PCA
samples = df.drop("target", axis = 1).values
model = PCA(n_components=2)
model.fit(samples)
# Get the mean of the grain samples: mean
mean = model.mean_
# Get the first principal component: first_pc
first_pc = model.components_[0,:]
# Visualize direction of the component
plt.arrow(mean[0] , mean[1], first_pc[0], first_pc[1], color='red', width=0.01)
plt.axis('equal')
plt.show()

transformed = model.transform(samples)
# Principal Components
principal_components = model.components_
# Visualize principal components contribution
features = range(model.n_components_)
plt.bar(features, model.explained_variance_)
plt.show()
# Visualize how the dataset looks after transformation
xs = transformed[:,0]
ys = transformed[:,1]
plt.scatter(xs, ys, c=df["target"].values)
plt.show()
# Contribution of Original Features to Principal Components
for i, pc in enumerate(principal_components):
    plt.bar(list(df.columns), np.abs(pc), label=f'PC {i + 1}', alpha=0.7)
plt.xlabel('Original Features')
plt.ylabel('Absolute Loadings')
plt.legend()
plt.show()


#### TruncatedSVD : PCA on sparse dataset (most entries are zero, remembers entries by saving columns that have values to save space)
from sklearn.decomposition import TruncatedSVD
# Apply TruncatedSVD
svd_model = TruncatedSVD(n_components=2)
transformed_svd = svd_model.fit_transform(samples.values) # samples is scipy.sparse.csr_matrix
# Visualize how the dataset looks after transformation
xs_svd = transformed_svd[:, 0]
ys_svd = transformed_svd[:, 1]
plt.scatter(xs_svd, ys_svd)
plt.show()
# Principal Components
svd_principal_component = svd_model.components_
# Visualize principal components contribution
plt.bar(range(1, svd_model.n_components + 1), svd_model.explained_variance_ratio_)
plt.show()
# Contribution of Original Features to Principal Components
for i, loading in enumerate(svd_principal_component):
    plt.bar(df.columns, np.abs(loading), label=f'PC {i + 1}', alpha=0.7)
plt.show()
```

### Sparse dataframe and tf-idf

```
documents = ['cats say meow', 'dogs say woof', 'dogs chase cats']
# Import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
# Create a TfidfVectorizer: tfidf
tfidf = TfidfVectorizer() 
# Apply fit_transform to document: csr_mat
csr_mat = tfidf.fit_transform(documents)
# Print result of toarray() method
print(csr_mat.toarray())
# Get the words: words
words = tfidf.get_feature_names()
# Print words
print(words)
# Create a dataframe from this sparse matrix representation
df = pd.DataFrame(data=csr_mat.toarray(), columns=words)
# From dataframe to sparse dataframe
sparse_df = some_df.sparse.to_coo()
# From sparse to dense dataframe
dense_df = sparse_df.to_dense()
```