# [DM 2024] Lab 2 Environment settings

Hi everybody,


We will have our second lab session on October 28 (Monday) 9:00 am on our Youtube channel stream: [DM Youtube Channel](https://www.youtube.com/@ISA5810DataMining). Please be on time.

We highly recommend you to attend the session with your personal laptop (that way you'll also have your environment set for the homework). These are some instructions for you to set up the environment:


## 1. Install libraries:

We will use some new Python libraries for the lab: Gensim, Tensorflow and Keras.

Once you have installed Python 3 (and optionally Anaconda), open a "terminal" windows (Linux/MacOS) or a "Command Prompt" window and type the following commands followed by "Enter":

    pip3 install gensim
    pip3 install tensorflow
    pip3 install tensorflow-hub
    pip3 install keras
    pip3 install ollama
    pip3 install langchain
    pip3 install langchain_community
    pip3 install langchain_core
    pip3 install beautifulsoup4
    pip3 install chromadb
    pip3 install gradio
    


---

## 2. Install Ollama in your device:

We will be using some small open-source LLMs that will be running in your device, and for that we will be using Ollama, please enter the website, download and install it: [Ollama website](https://ollama.com/download)
And don't worry, if needed, Ollama can also be run online through Kaggle, we will be discussing that later on.

After the installation is done, go to your terminal and type: **ollama**
You should be getting the following information if the installation was correct:

![pic8.png](attachment:pic8.png)

We will be using 3 Open-source LLMs during this lab, it is recommended to have at least **5 GB free of RAM** to run them, so for that you will need to type the following commands to download the models: 
- ollama run llama3.2
- ollama run llama3.2:1b  (this is just in case the first one is too slow in your device)
- ollama run llava-phi3

After you run one of the commands the model will start to download in this way:



![pic7.png](attachment:pic7.png)

So download and install all of the models one by one.

After finishing you can verify each model by asking something in a prompt in the terminal: 

![pic9-2.png](attachment:pic9-2.png)

---

## 3. Run Jupyter Python and check your environment:

Open a new Jupyter notebook server. In order to do this, open a "terminal" windows (Linux/MacOS) or a "Command Prompt" window and type the following commands followed by "Enter":

    jupyter notebook
    
If you receive an error message, zsh: command not found: jupyter, type the following commands instead.

    python3 -m notebook 
or

    python -m notebook

Just like the image below:

![pic1.png](img/pic1.png)

A window like the one below should open in your browser. Please go to the "New" button on the top right cornerand select "Python 3".

![pic2.png](img/pic2.png)

This will open a new notebook. You will be able to run "Cells" of code and get the outputs printed below, as well as cells of text. If you want to learn more on how to use a notebook, read the documentation below:

https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Running%20Code.html

https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html

Once you opened a new notebook, please paste the script below in a cell and press the "Run" Button (or the "Shift" + "Enter" keys). Make sure you have no errors!

In [3]:
# import library
import pandas as pd
import numpy as np
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import umap
import gensim
import tensorflow
import keras
import ollama
import langchain
import langchain_community
import langchain_core
import bs4
import chromadb
import gradio

%matplotlib inline

print("gensim: " + gensim.__version__)
print("tensorflow: " + tensorflow.__version__)
print("keras: " + keras.__version__)

gensim: 4.3.3
tensorflow: 2.17.0
keras: 3.6.0


**It should look similar to this:**

![pic3_2.png](attachment:pic3_2.png)

---

## 4. Download a pre-trained Word2Vec model
If you are using a Jupyter Notebook or Google Colab, you can download the Google News Vector:
https://code.google.com/archive/p/word2vec/

If you are using Kaggle Kernel, you can add this dataset to your notebook: https://www.kaggle.com/datasets/keziaflaviana/googlenews

---

## Note:

If you have issues to install some libraries (<span style="color:red">For example, Keras is not fully compatible with Macbook M1 and M1 Pro</span>), please try using Google Colab (https://colab.research.google.com/) or Kaggle (https://www.kaggle.com/).

### Step 1: Create a notebook and install the following scripts and Ollama

In [None]:
!pip3 install scikit-learn --upgrade
!pip3 install pandas --upgrade
!pip3 install numpy --upgrade
!pip3 install matplotlib --upgrade
!pip3 install plotly --upgrade
!pip3 install seaborn --upgrade
!pip3 install nltk --upgrade
!pip3 install umap-learn --upgrade

!pip3 install gensim --upgrade
!pip3 install tensorflow --upgrade
!pip3 install keras --upgrade

!pip3 install ollama --upgrade
!pip3 install langchain --upgrade
!pip3 install langchain_community --upgrade
!pip3 install langchain_core --upgrade
!pip3 install beautifulsoup4 --upgrade
!pip3 install chromadb --upgrade
!pip3 install gradio --upgrade

In [None]:
#Download ollama
!curl -fsSL https://ollama.com/install.sh | sh
import subprocess
process = subprocess.Popen("ollama serve", shell=True) #runs on a different thread

In [None]:
#Download model llama 3.2
!ollama pull llama3.2

In [None]:
#Download model llama 3.2:1b
!ollama pull llama3.2:1b

In [None]:
#Download model llava-phi3
!ollama pull llava-phi3

If you are using Kaggle, skip to Step 2.

If you are using Google Colab, after installing, in the output you might see a warning.

**You need to restart the runtime in order to use newly installed versions.** Press the **"RESTART RUNTIME"** button. 

### Step 2: Run the following script

In [None]:
# import library
import pandas as pd
import numpy as np
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import umap
import gensim
import tensorflow
import keras
import ollama
import langchain
import langchain_community
import langchain_core
import bs4
import chromadb
import gradio

%matplotlib inline

print("gensim: " + gensim.__version__)
print("tensorflow: " + tensorflow.__version__)
print("keras: " + keras.__version__)

The output should look similar to the previous image as well, without any problem and showing the libraries version.

### Step 3: Prepare the files

#### Google Colab
In this lab, we will need to import some txt files as our data. If you are using Google Colab, you can import the files by following the instructions below:

- Try to copy this version of the lab in colab and run it: [Lab-2 Colab](https://colab.research.google.com/drive/1T_o9NeByZDk2BL0aQbDkRO0rfhUBDdsp?usp=sharing)

You can also try to mount the environment in this way:
- First download the ZIP of the [DM2024-Lab2-Master](https://github.com/didiersalazar/DM2024-Lab2-Master), unzip it and upload the entire folder to your Google Drive (simply by dragging the folder to Google Drive). After that, you can follow [this guide](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA) to mount your Google Drive on your runtime and access the files.

- Assuming you put the unzipped "DM2024-Lab2-Master" folder in the first layer of Google Drive, here is how you will need to slightly modify the codes in Section 1.1 "Load data" in order to load the data.

![pic6.png](attachment:pic6.png)



#### Kaggle
If you are using Kaggle, you can directly copy and edit this notebook:
https://www.kaggle.com/code/didiersalazar/dm2024-lab2-master

The file path should be correct. However, you can double check by running the cells. If you don't see any error, then you are good to go.



![pic_10.png](attachment:pic_10.png)

Also, please make sure the computer you will use during the lab session can work before the lab!

## Important Note:  If you're having installation issues with all of this, please ask your classmates or TAs for help well ahead of the lab session.

Good luck and see you on Monday!

Best regards,   
The TAs