# Class 26: Unsupervised learning

Plan for today:
- Hierarchical clustering
- Quick demo of using an LLM/chatbot in Python


## Notes on the class Jupyter setup

If you have the *ydata123_2024a* environment set up correctly, you can get the class code using the code below (which presumably you've already done given that you are seeing this notebook).  

In [None]:
import YData

# YData.download.download_class_code(26)   # get class code    
# YData.download.download_class_code(26, TRUE) # get the code with the answers 


If you are using colabs, you should install the YData packages by uncommenting and running the code below.

In [None]:
# !pip install https://github.com/emeyers/YData_package/tarball/master

If you are using google colabs, you should also uncomment and run the code below to mount the your google drive

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [1]:
import statistics
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
from urllib.request import urlopen

import matplotlib.pyplot as plt
%matplotlib inline

## 1. Unsupervised learning: Hierarchical clustering

In unsupervised machine learning, we try to find patterns in the data using only a set of features X (without any labels y). 

Let's explore clustering which is a form of unsupervised learning. 


In [2]:
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


penguins = sns.load_dataset("penguins")

penguins = penguins.dropna()

penguins = penguins.sample(frac = 1)


# get the features and the labels
X_penguin_features = penguins[['bill_length_mm', 'bill_depth_mm','flipper_length_mm', 'body_mass_g']]
y_penguin_labels = penguins['species']

We can do k-means clustering in scikit-learn using the `KMeans()` object.


In [1]:
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster import hierarchy

#  Ward's method adds points to a cluster that minimizes the sum of squared differences 
# within all clusters




In [2]:
# display a dendrogram




In [3]:
# cluster points into 3 clusters 





# get the predicted cluster for each point





In [4]:
# visualize how well the clustering matches the penguin species






## 2. Chatbots 

Large language models (LLMs) are taking over the world. I, for one, welcome our new robot [overlords](https://www.youtube.com/watch?v=8lcUHQYhPTE).

Let's explore how we can use a model from HuggingFace to create a chatbot.

To do this we need to install some additional packages. I recommend cloning your Jupyter environment, and then adding these packages to the new environment.


In [None]:
# uncomment the code in this cell and RUN IT ONLY ONCE to create a new conda environment 
# that will have the packages necessary to create a chatbot

# Note: this might not work. I recommend only trying this after you've finished all 
# the rest of the work for the class - i.e., after you've turned in your final project

#!conda create --name ydata123_2024c --clone ydata123_2024a
#!conda activate ydata123_2024c
#!conda install conda-forge::transformers -y
#!conda install pytorch::pytorch==2.2.2
#!conda install conda-forge::tensorflow -y
#!conda install conda-forge::flax -y


In [1]:
# Modified from code created by Giuliano Formisano

# load libs
import pandas as pd
import numpy as np
from transformers import pipeline, Conversation

# load conversational pipeline
chatbot = pipeline(model="facebook/blenderbot-400M-distill")

2024-04-24 16:49:00.972990: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  return self.fget.__get__(instance, owner)()


In [4]:
# set user input
user_input = "Hi! What can you do?" # add your prompt here

# generate response using pipeline
response = chatbot(user_input)

# print results
print(f"User: {user_input}")
print(f"Chatbot: {response[0]['generated_text']}")

User: Who is better the Boston Red Sox or New York Yankees?
Chatbot:  I would have to say the red sox. They have been around since 1903.


### Loop for an interaction User-Chatbot

In [7]:
# Loop of interaction user-chatbot
while True:
  user_input = input("You: ") # add prompt in the appearing box below
  if user_input.lower() == "quit": # write "quit" to interrupt
    break
  response = chatbot(user_input) # this is a bit slow
  print(f"Chatbot: {response[0]['generated_text']}")

You:  quit
