# Homework 1 Introduction to Python 
In this homework, we will be getting more experienced with python through file processing. With this guide, you will be able to:
- read from a text (.txt) file
- process the words of the file
- output the result of this process

### Instructions

1. Follow the instructions on how to setup your Python and Jupyter (or VSCode) environment and cloning or downloading our repository. Instructions can be found in the class notes:
   https://filipinascimento.github.io/usable_ai/m00-setup/class
2. Ensure that you have Python and Jupyter Notebook working. (You can also try using Google Colab. This is not the preferred method for this homework, but it is an option)
3. Load the text files: `story-1.txt`, `story-2.txt`, `story-3.txt`, and `story-4.txt`, located in the `Datasets` directory. (If you are using Google Colab, you will need to upload the files to the colab environment)
4. Answer the questions below by writing or completing the code in the provided cells.

### Dataset Overview
The dataset consists of four text files, each containing a story. The stories are:

- `story-1.txt`: The Monkey and the Crocodile
- `story-2.txt`: The Musical Donkey
- `story-3.txt`: A Tale of Three Fish
- `story-4.txt`: The Foolish Lion and the Clever Rabbit

### Submission Guidelines

- Submit your completed notebook as a HTML export, or a PDF file.

To export to HTML, if you are on Jupyter, select `File` > `Export Notebook As` > `HTML`.

If you are on VSCode, you can use the `Jupyter: Export to HTML` command.
 - Open the command palette (Ctrl+Shift+P or Cmd+Shift+P on Mac).
    - Search for `Jupyter: Export to HTML`.
    - Save the HTML file to your computer and submit it via Canvas.

---

> 
> **Using Generative AI Responsibly**
>
> You're welcome to use Generative AI to assist your learning, but focus on understanding the concepts rather than just solving the assignment. For example, instead of copying and pasting the question into the model, ask it to explain the concept in the question. Try asking: `How can I open a file in Python? Can you give me examples?` or `What functions and methods can I use to extract the words of a text file? Can you explain how they work with some examples?`
>
> This way, you will learn how the solution works while building your skills. Remember to give context to the generative AI, so it can better assist you. Talk to the instructor and AIs if you have any questions or need insights.


To begin, we will first need to make sure the functions (set of instructions) we need to use are included in this file. We'll do this through an import statement, followed by the libraries we need. In this case, we will use the `os` library. In the cell below, type the library name where the `_` is. In the remainder of this guide, you will be filling in some value where ever a `_` or `...` is.

In [None]:
# TODO - Enter Library Name
import _

Next, we'll need to put the text files in a known location. The cell below prints out the current path, place the text files in this directory to ensure they are found by your script.

In [None]:
# Getting the current path and displaying it
current_path = os.getcwd()
print(current_path)

You can compose paths using the `os.path.join()` function. This function takes two arguments, the first being the directory path and the second being the file name. This function will return a string with the path to the file.

For example, suppose you have a directory named `data` and a file named `story-1.txt`. You can compose the path to the file using the following code:


In [None]:
# Composing the path to the file
a_path = os.path.join('data', 'story-1.txt')
print(a_path)

Now let's create the directory path to the `Datasets` folder. You can use `".."` to refer to the parent directory. For example to compose a path from the parent of the current directory

```python
path_to_datasets = os.path.join("..", "Datasets")
```

You can use multiple path components to compose a new path. For example to compose a path from the parent of the current directory to the `Datasets` folder you can use:

```python
path_to_datasets = os.path.join(a_path, "..", "..","a_file.txt")
```

In the cell below, complete the code to create the path to the `Datasets` folder. Remember you already have the current path from previous cells.

In [None]:
# Defining the datasets directory path
# TODO - Enter the path to the datasets
dataset_path = os.path.join(_)
print(dataset_path)

## Opening and Reading one of the text files
Let's open the first file (`stories-1.txt`) and read it's content.  <br> We will need to call `read()` on `fp` to read in values from the file. <br> 

In [None]:
# TODO - Fill in the stories filename
file_path = os.path.join(dataset_path,_)
# Opening the file
# TODO - Fill the way to open the file
with open(file_path, _) as fp:
    # TODO - Fill in the variable that represents the file we are working with
    content = _.read()
    # now you should have the content of the file in the variable content
print("Content of the file:",content)

You may also find some code with try and catch. In our `except` block, we check if any issues occur, and if they do, we print them to the screen. This is used as a way to catch any errors that may occur during the reading of the file.

For instance:

```python
try:
    with open("a_file.txt", "rt") as fp:
        text = fp.read()
        print(text) 
except Exception as e:
    print("Error reading file:", e)
```

Add a `try` and `except` block to the code that you complete in the previous cell to catch any errors that may occur during the reading of the file.

In [None]:
# TODO - Add a try except block to catch the exception
try:
    # TODO - Fill in the file name
    # Add your previous cell code here
    ...
except Exception as e:
    print("Error reading file:", e)

## Counting Words
Now we can read the file, we want to count the words that we read into the `words` variable. Take a look at the lesons and/or python documentation to find out how to split the text into words. I also suggest you use the `lower()` method to convert all words to lowercase. This way, we can count the words without worrying about the case of the words.

In [None]:
# TODO - Convert content to lowercase and split it into words
lower_content = content.lower()
words = ... 

# The response should be a python list of words like: "This is a test" -> ['this', 'is', 'a', 'test']

Now create a dictionary and start counting the words by using a for loop. If the word is not in the dictionary, add it with a count of 1. If the word is already in the dictionary, increment the count by 1.

In [None]:
# Now create a dictionary and start counting the words by using a for loop. If the word is not in the dictionary, add it with a count of 1. If the word is already in the dictionary, increment the count by 1.
word_count = {}
for word in words:
    if word in word_count:
        # TODO - Increment the count
        ...
    else:
        # TODO - Add the word to the dictionary with a count of 1
        ...

print(word_count)



## Repeating the process for all files
Now that you have the code to read and count the words in a file, you can repeat this process for all the files. You can create a function that reads a file and returns the word count. Then you can call this function for each file.

Create a function `count_words_in_file(file_path)` that reads a file and returns a dictionary with the word count. The function should take the file path as an argument.

Use your previous codes as a base to create this function. You can copy and paste the code you wrote before into the function.

In [None]:
def count_words_in_file(file_path):
    # TODO - Copy the code from the previous cells here, make sure that file_path 
    ...
    return word_count

Let's test the function with another story file, like `story-2.txt`.

In [None]:
# You should be able to test your function by running the following code:
file_path = os.path.join(dataset_path, "story-2.txt")
counts_dictionary = count_words_in_file(file_path)
print(counts_dictionary)

We can now create a list of the files names to individually loop through. <br> Notice stories is a `list`, as it is assigned to values enclosed by brackets (`[]`).

In [None]:
# List of files names to read in the same directory
# TODO - Enter the file names inside the list
stories = [...]

Let's now loop through the list of files and call the function `count_words_in_file` for each file. We will store the result in a list of dictionaries called `word_counts`.

In [None]:
word_counts = []

for story in _:
    # TODO - Fill in the file path
    file_path = os.path.join(dataset_path, story)
    # TODO - Call the function count_words_in_file with the file_path
    word_count = ...
    # TODO - Append the result to the word_counts list
    ...

print(word_counts)

And there you have it! Submit the completed version of this assignment for points. Save the notebook as a PDF or HTML file and submit it via Canvas. Keep the output of the code cells visible in the exported file.