## This homework assesses your ability of building and deploying your applications. 


`start date: Nov 22nd 11:59 PM` <br>
`due date: Dec 5th 11:59 PM`

### Make sure you submitted your submissions to BrightSpace.

`Total credits:  63/53`

You're welcome to share your thoughts about the homework and the course materials here: https://forms.gle/Kd9AoUZwkMiF8Vx5A

# P1: File path related questions

In the following structure, `project`, `data`, and `scripts` are folder names. `my_notebook.ipynb` is the Jupyter Notebook you used to create these folders.

```python
project/
    data/
        file_name.csv
    scripts/
        script.py
my_notebook.ipynb  

```

`P1.1   3 pts` Suppose I'm currently at the same level as the project folder, meaning I am outside the project folder but within the same parent directory. I want to open the file_name.csv file using the following code written in my my_notebook.ipynb:

```python
df = pd.read_csv('file_name.csv')

```
Will this code work? If not, how should I revise it? 

No, it won't work. We should use this
```python
df = pd.read_csv('project/data/file_name.csv')

```

`p1.2 3 pts`  Suppose you are now in the `data` folder, and you want to re-write your `script.py` file using the `%%writefile` method while reflecting all your actions (e.g., cd commands, file path changes) within the `my_notebook.ipynb` file. Can you demonstrate how you would accomplish this?

Change the current directory to the 'data' folder and use %%writefile to create or overwrite 'script.py' in the scripts folder
```python
%cd project/data
%%writefile ../scripts/script.py
print("This is the script.py file")
```

`p1.3 3 pts`  Suppose you are outside of the `project` folder but still within the same parent directory, and you want to re-write your `script.py` file using the `%%writefile` method while reflecting all your actions (e.g., cd commands, file path changes) within the `my_notebook.ipynb` file. Can you demonstrate how you would accomplish this? Assume you did everything within your `my_notebook.ipynb`. 

Change the current directory to the parent of the 'project' folder and Use %%writefile to create or overwrite 'script.py' in the 'scripts' folder inside 'project'
```python
%cd .
%%writefile project/scripts/script.py
# Your Python script content here
print("This is the script.py file")
```

`p1.4 3 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to create a sub-folder named `test_data` under the `data` folder. Can you demonstrate how you would accomplish this? Assume you did everything within your `my_notebook.ipynb`. 

Import the os module to handle directory operations and create the 'test_data' folder under 'data' inside 'project'
```python
import os

os.makedirs('project/data/test_data', exist_ok=True)

print("Sub-folder 'test_data' created under 'project/data'")
```

`p1.5 5 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to import your script.py file as a module. Your `script.py` file as the following contents:

```python
df = pd.read_csv('file_name.csv')
```

You have used the following codes to do the import within your `my_notebook.ipynb` file:

```python
import script
```

What are the issues with the script.py file ifself and also the way to import it? Explain bellow and fix it yourself. 


```python
# Add the 'project/scripts' folder to Python's module search path
import sys
sys.path.append('project/scripts')

# Import the script module
import script

# Use the load_data function to load the CSV file
df = script.load_data()

print(df.head())
```


`p1.6 6 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to first import the class `clean_data`  from `script.py`, and then create an instance of the class. Your `script.py` file as the following contents:

```python
class clean_data:

    def __init__(self):
        df = pd.read_csv(`file_name.csv`)
```

You have used the following codes to import and create instance:

```python
import script

clean_data = clean_data()

```

What are the issues with the script.py file ifself(1 issue ) and also the way to import it (2 issues)  and create instance(1 issue) ? Explain bellow and fix it yourself. 

```python
# Add the scripts folder to Python's module search path
import sys
sys.path.append('project/scripts')

# Import the clean_data class from script.py
from script import clean_data

# Create an instance of the clean_data class
cleaner = clean_data()

print(cleaner.df.head())
```

# P2. `30 pts` Coding challenges (Individual version)

For this question, you will build a movie recommendation system using K-Nearest Neighbors (KNN) and create a webpage interface with Streamlit. The Streamlit webpage should allow the user to input a movie name they have watched before, and based on that input, the system will recommend 4 similar movies. Additionally, you will deploy your Streamlit app on AWS EC2.

For this question, I will place fewer restrictions on the choice of data and features, allowing you to mimic real-world decision-making as data professionals. In previous assignments, I guided you step-by-step to teach you how to correctly code each small part. Now that you’ve gained those foundational skills, this assignment will focus more on exercising your ability to design and solve problems independently.

You are free to use any dataset you like for this assignment. The movie dataset from HW8 is sufficient, but you are welcome to explore and use other datasets if you prefer.

Your final submission should include:

- A screenshot of the Streamlit webpage displaying your app and its recommendations.
- The source code used to create the application.

`Bonus 10 pts` use a pre-trained LLM model to add a short description to the movie provided by the user. 


In [4]:
import pandas as pd

movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

movies['genres'] = movies['genres'].str.replace('|', ' ')
movies.head()


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure Animation Children Comedy Fantasy
1,2,Jumanji (1995),Adventure Children Fantasy
2,3,Grumpier Old Men (1995),Comedy Romance
3,4,Waiting to Exhale (1995),Comedy Drama Romance
4,5,Father of the Bride Part II (1995),Comedy


In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

knn = NearestNeighbors(metric='cosine', algorithm='brute')
knn.fit(tfidf_matrix)


In [8]:
import streamlit as st

st.title("Movie Recommendation System")

movie_name = st.text_input("Enter a movie you enjoyed:")

if st.button("Recommend"):
    if movie_name:
        try:
            recommendations = recommend_movies(movie_name)
            st.write("Here are some similar movies you might enjoy:")
            for rec in recommendations:
                st.write(f"- {rec}")
        except:
            st.error("Movie not found. Please try another!")
    else:
        st.warning("Please enter a movie name.")


2024-12-04 23:23:12.967 
  command:

    streamlit run C:\Users\hibu\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\ipykernel_launcher.py [ARGUMENTS]
2024-12-04 23:23:12.970 Session state does not function when running a script without `streamlit run`


In [7]:
pip install streamlit

Collecting streamlit
  Downloading streamlit-1.40.2-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting blinker<2,>=1.0.0 (from streamlit)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting cachetools<6,>=4.0 (from streamlit)
  Downloading cachetools-5.5.0-py3-none-any.whl.metadata (5.3 kB)
Collecting click<9,>=7.0 (from streamlit)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting protobuf<6,>=3.20 (from streamlit)
  Downloading protobuf-5.29.1-cp310-abi3-win_amd64.whl.metadata (592 bytes)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-18.1.0-cp311-cp311-win_amd64.whl.metadata (3.4 kB)
Collecting rich<14,>=10.14.0 (from streamlit)
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting watchdog<7,>=2.1.5


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\hibu\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


# P2. `30 pts` Coding challenges (Group version)

For this assignment, you are allowed to collaborate with up to 2 classmates and turn it into a group project. You can choose any problem and dataset to work on, but your project must demonstrate 3 of the following skills. Note: You must include either 3 (Running the project in Docker) or 4 (Deploying the application on AWS EC2) as one of your chosen skills.

1. Creating a Python package.
2. Building a webpage using Streamlit with user input functionality.
3. Running the project in Docker (mandatory if 4 is not chosen).
4. Deploying the application on AWS EC2 (mandatory if 3 is not chosen).
5. Applying KNN to solve a problem.
6. Utilizing LLM models in your application.


In your submission, you must clearly list your teammates' names, and each team member must submit the assignment individually on Brightspace.

Your final submission should include:

- A screenshot of the Streamlit webpage displaying your app and its functionality.
- The source code for the application.

`Bonus 10 pts` 


Prepare a roughly 5 minute presentation to deliver in class. If you plan to present, please notify me at least 2 days before the deadline to allow sufficient time for adjustments to the course schedule. Bonus points will be awarded after the presentation.