#Question 1

Write a program to request a file name from the user and calculate the
following statistics of the contents of the file:
1. Number of lines
2. Number of words
3. Number of characters
4. Average length of a word

In this problem use the following definitions:

A line is a sequence of characters that end with a newline (\n) character

A word bounded by one or more spaces (or \n) on either side of it (or both sides)

A character is any single length string, e.g. ‘a’, ‘-‘, etc. but not a space (or white space)

Note: If your file statistics are different from the answer I have given above, please explain in your notes/markdown script how you arrived at your answers. For example if you use the readlines() function then it will count the last line which does not end with a newline (\n) as a line, that is fine as long as you understand it and are able to explain

###Algorithm

1. Define a function named file_info that takes a file name as input (user).
  * Open the file in read mode using a with statement to ensure proper handling of file resources.
  * Read the content of the file.
  * Count the number of lines in the file.
  * Split the content into words and count the number of words.
  * Count the total number of characters in the file.
  * Calculate the average length of a word in the file.
  * Print the number of lines, words, characters, and the average length of a word.
2. Within the function, handle the FileNotFoundError exception in case the specified file is not found. Print an error message in this case.
3. Get user input for the file name.
4. Call the file_info function with the user input as an argument.

###Code Implementation

In [None]:
#To define a function that takes the file name as input
def file_info(user):
  try:
    #To open the file in read mode
    with open(user, 'r') as f:
      #To read the content of the file
      content = f.read()

      #To count the lines in the file
      count_lines = content.count('\n') + 1
      print(f"Number of lines: {count_lines}")

      #To split the content into words and count them
      words = content.split()
      count_words = len(words)
      print(f"Number of words: {count_words}")

      #To count the total number of characters in the file
      count_characters = len(content)
      print(f"Number of characters: {count_characters}")

      #To calculate the average length of a word in the file
      if count_words > 0:
        avg_length =  sum(len(word) for word in words) / len(words)
      else:
        avg_length = 0
      print(f"Average length of a word: {avg_length}")

  #To handle exception when the input is incorrect
  except FileNotFoundError:
    print(f"An error has occurred: File '{user}' not found in the database.")

#To get user input
user = input("Enter file name: ")

#To call the function using the user input
file_info(user)

Enter file name: dnfbcmns
An error has occurred: File 'dnfbcmns' not found in the database.


#Question 2

A string is an anagram of another if the second string is simply a scrambled version of the first. Write a python program to implement the following game:

a) Reads in a file that has words and their meanings in a text file. An example “words and their meanings” file is given in canvas. Note that your program needs to ask the user for the “words and their meanings” file to use.

b) The words and their meanings text file is of the csv (comma separated values) format. Use either notepad++ or notepad to create your file in the same format as the mywords.txt file given to you in canvas.

c) Your program should then pick a word from the “words and their meanings” file, scramble the letters, and ask the user to unscramble it. Every run of your program should pick a word at random.

d) The user may type in the unscrambled word or may ask for the definition meaning of the word by entering a question mark.

e) The game continues for the number of times as the number of letters in the word. For example if the word is “poor”, the program will allow four attempts.

f) The program will also provide the definition of the word (at the user’s request), only once. If the user enters “?” more than once, an error message will be given with the warning that the next input of a “?” will be counted as an attempt at the answer.

g) until the user says “no” to the question: “Do you want to continue?”.

###Algorithm

1. Read the File (read_the_file function):
  * Read a file with words and meanings.
  * Create a dictionary to store word-meaning pairs.
  * Populate the dictionary with data from the file.
  * Return the dictionary.
2. Shuffle Word Letters (rearranged function):
  * Shuffle the letters of a given word.
3. Word Unscrambling Game (game function):
  * Ask the player if they want to play.
  * While the player wants to play:
    * Choose a random word, shuffle its letters, and display it.
    * Allow the player attempts to unscramble the word.
    * Provide the meaning if requested (once).
    * Congratulate the player on correct unscrambling or end the game if attempts are exceeded.
    * Ask if the player wants to play again.
4. Main Program (main function):
  * Get the file name from the user.
  * Attempt to read the file and handle errors.
  * If successful, play the word unscrambling game.
5. Program Execution:
  * Get the file name, read the file, and store word-meaning pairs.
  * Play the word unscrambling game with the player.
  * Handle errors if the file is not found.


###Code Implementation

In [None]:
import random

#To read a file and store the data in a dictionary
def read_the_file(file_name):
  words_and_meanings = {}
  with open(file_name, 'r') as f:
    for row in f:
      #To add the words and their meanings in a dictionary by splitting each row
      word, meaning = row.strip().split(',')
      words_and_meanings[word] = meaning
  return words_and_meanings

#To shuffle the letters of a word
def rearranged(word):
  scrambled = list(word)
  random.shuffle(scrambled)
  return ''.join(scrambled)

#To implement the word unscrambling game
def game(words_and_meanings):
  #To ask the player if they want to play
  play = input("Do you want to play? (y/n)").lower()

  #To continue playing as long as the player wants
  while play == 'y':
      #To choose a random word from the dictionary
      word = random.choice(list(words_and_meanings.keys()))
      scrambled_word = rearranged(word)
      trials = len(word)
      computer_output = False

      #To display the scrambled word to the player
      print(f"Unscramble the following letters to form a word: {scrambled_word}")

      #To allow the play a certain number of attempts
      for _ in range(trials):
        #To get user input
        user = input("Enter the answer [or ? for the meaning]: ").lower()

        if user == '?':
          #To provide the meaning
          if computer_output:
            print("You have been given the meaning before. Next time you ask for the meaning, it will count as an attempt!")
          else:
            print(f"The word means: {words_and_meanings[word]}")
            computer_output = True
        elif user == word:
          print("You got it!")
          break
        else:
          print("Wrong, try again")

      #To check of the player exceeded the number of attempts
      if not computer_output:
        if user != word:
          print("Wrong, you have exceeded the number of attempts. Bye!")

      #To ask the player if they want to play again
      play = input("Do you want to play again? (y/n)").lower()

  print("Goodbye!")

#To execute the program
def main():
  #To get the user to input file name
  file_name = input("Give the name of the 'words and their meanings' file:")

  try:
    #To attempt to read the file and play the game
    words_and_meanings = read_the_file(file_name)
    game(words_and_meanings)
  except FileNotFoundError:
    print(f"File '{file_name}' not found.")

#To call the main function to run the program
main()

Give the name of the 'words and their meanings' file:/content/drive/MyDrive/Colab Notebooks/Homework 2 Datasets/mywords.txt


KeyboardInterrupt: ignored

#Question 3

The file poetry_lines.txt is given to you (see Canvas for the file
poetry_lines.txt).
The file contains names of poets and an extract of their poetry. New lines in each poem are represented by a ‘/’. The format of a line is the following:

<Poet’s name>:<poetry delineated by ‘/’>\n

The first string in each line is the name of the author followed by a ‘:’, followed by the poetry which is delineated by ‘/’ to represent a new line in the poem.

The next line contains the next poem and so on.

You are required to input a few lines of your own poem to the python program (with lines separated by “/”) and compute the cosine distance (similarity score) between each line (of poetry from the file) and your own poem. Finally your program should display the following:

* Each poet and the similarity score with your poem.
* Finally display the poem that is closest to your input.

An example of how to compute the cosine distance between two lines of text is given below:

Line1: Hi Hi Hi how are you

Line2: Hi how are u u

1. The cosine similarity between two vectors (A and B) is given by the formula:

                similarity = cos(theta) = A.B / (||A|| ||B||)

2. Create a dictionary of keys with the words and the values with the number of occurrences.

  a. Line1dict = {‘Hi’:3,’how’:1,’are’:1,’you’:1,’u’:0}

  b. Line2dict = {{‘Hi’:1,’how’:1,’are’:1,’you’:0,’u’:2}

3. The values in the two dictionaries form the two vectors to be compared. For example:

  a. Line1vector = (3,1,1,1,0)

  b. Line2vector = (1,1,1,0,2)

4. Now you can implement the cosine similarity formula above to compare Line1vector and Line2vector.
5. Note that the cosine distance should be between 0 and 1.

Your program should provide the following:

1. Ability to read in a user provided filename which contains names of poets and an extract of their poetry in the specified format.
2. Read in user’s own poem (using the input() method).
3. Functionality to compute the similarity (use cosine distant) between each poem and the user’s poem and display the results.
4. Finally choose and display the poem that is most similar to the user’s input.

###Algorithm

1. Define line_analysis function to preprocess a line:
  * Tokenize the line into words.
  * Create a word frequency dictionary for the line.
  * Return the word frequency dictionary.
2. Define similarity_score function to calculate cosine similarity:
  * Calculate the dot product of two vectors.
  * Calculate the magnitude of each vector.
  * Avoid division by zero.
  * Return the cosine similarity.
3. Main Function (main):
  * Read a poetry file specified by the user.
  * If the file is not found, print an error message and exit.
  * Prompt the user to input their poem, delimited by '/' for each line.
  * Preprocess the user's poem using line_analysis.
  * Compute cosine similarity between the user's poem and each poem from the file.
  * Display the cosine similarity results for each poet.
  * Identify and display the closest poem and poet.
4. Program Execution:
  * Get the name of the poetry file from the user.
  * Read the file and store each line in a list (poem_lines).
  * Get the user's poem input and preprocess it.
  * Compute and display the cosine similarity between the user's poem and each poem from the file.
  * Identify and display the closest poem and poet based on cosine similarity.

###Code Implementation

In [None]:
import math

def line_analysis(line):
  # Function to preprocess a line by tokenizing and creating a word frequency dictionary
  words = line.split()
  count = {}
  count = {word: count.get(word, 0) + 1 for word in words}
  return count

def similarity_score(v1, v2):
  # Function to calculate cosine similarity between two vectors
  dot_prod = sum(v1[key] * v2[key] for key in v1 if key in v2)
  mod_v1 = math.sqrt(sum(value ** 2 for value in v1.values()))
  mod_v2 = math.sqrt(sum(value ** 2 for value in v2.values()))

  if mod_v1 == 0 or mod_v2 == 0:
    return 0  # Avoid division by zero

  return dot_prod / (mod_v1 * mod_v2)

def main():
  # Step 1: Read in the poetry file
  read_file = input("Give the name of the poetry file: ")
  try:
    with open(read_file, 'r') as f:
      poem_lines = f.readlines()
  except FileNotFoundError:
    print(f"File '{read_file}' not found.")
    return

  # Step 2: Read in user's poem
  user = input("Input your poem delineated by '/' for each line: ")

  # Preprocess user's poem
  user_vec = line_analysis(user)

  # Step 3: Compute cosine similarity and display results
  cosine_similarity_result = {}

  for line in poem_lines:
    poet_name, poem = line.strip().split(':')
    poem_vec = line_analysis(poem)
    similarity = similarity_score(user_vec, poem_vec)
    cosine_similarity_result[poet_name] = similarity

  # Display cosine distance results
  print("Cosine distance results:")
  for poet_name, similarity in cosine_similarity_result.items():
    print(f"{poet_name}: {similarity}")

  # Step 4: Identify and display the closest poem
  closest_poet = max(cosine_similarity_result, key=lambda x: cosine_similarity_result[x])
  closest_similarity = cosine_similarity_result[closest_poet]
  closest_poem = poem_lines[list(cosine_similarity_result.keys()).index(closest_poet)].strip().split(':')[1]

  print("The closest poem is:")
  print(f"{closest_poet}: {closest_poem}")

main()

Give the name of the poetry file: jbhj
File 'jbhj' not found.


#Question 4

This is a project to scrape data from the web and store the results in a text
file and the SQLite database.


The website https://finance.yahoo.com/trending-tickers lists extensive finance
data. You have to write Python scripts/programs to collect the current prices
for the following commodities: Crude Oil, Gold and Silver.

Your program
should store the commodity name and its corresponding price in a text file
called commodity_prices.txt.
In addition to the commodity_prices.txt file, the data should also be stored in
an SQLite database called CommodityDatabase in the directory that your
Jupyter Notebook code will be executed from.

The CommodityDatabase
should have a table called CommodityTable that contains the following
columns and types:
Ticker TEXT
Price REAL
Every execution of your program should create a new commodity_prices.txt
and CommodityDatabase.db file in the directory (delete any existing files that
you will create) that your Python script is located and run




###Algorithm

1. Install the necessary Python libraries.
2. Remove Existing Files:
  * Iterate over the list of file names ('commodity_prices.txt' and 'CommodityDatabase.db').
  * Check if each file exists using os.path.exists().
  * If a file exists, remove it using os.remove().
3. Define Ticker Symbols and Commodity Names Mapping:
  * Create a dictionary (commodity_map) that maps ticker symbols to commodity names.
4. Fetch Current Prices Using yfinance:
  * Get a list of ticker symbols from the commodity_map dictionary.
  * Use yf.download() to fetch historical financial data for the tickers with a period of one day.
  * Extract the closing prices for the tickers from the downloaded data.
  * Create a dictionary (prices) that maps commodity names to their corresponding rounded closing prices.
5. Save Data to Text File:
  * Open the file 'commodity_prices.txt' in write mode.
  * Write each commodity name and its price in the specified format to the file.
6. Save Data to SQLite Database:
  * Connect to the SQLite database file ('CommodityDatabase.db').
  * Create a cursor object to interact with the database.
  * Execute a SQL command to create a table named CommodityTable with columns Ticker and Price.
  * Use the executemany() method to insert multiple rows of data into the CommodityTable from the prices dictionary.
  * Commit the changes to the database.
  * Close the database connection.
7. Print Success Message:
  * Print a message indicating that the data has been fetched and stored successfully.


  Note: Used yfinance to fetch the data because I was unable to fetch the data from their websites

###Code Implementation

In [None]:
import yfinance as yf
import sqlite3
import os

#To remove existing files
for file in ['commodity_prices.txt', 'CommodityDatabase.db']:
  os.remove(file) if os.path.exists(file) else None

#To ticker symbols to commodity names
commodity_map = {'CL=F': 'Crude Oil', 'GC=F': 'Gold', 'SI=F': 'Silver'}

#To fetch current prices using yfinance
tickers = list(commodity_map.keys())
data = yf.download(tickers, period='1d')['Close']
prices = {commodity_map[t]: round(data.iloc[-1][i], 2) for i, t in enumerate(tickers)}

#To save data to text file
with open('commodity_prices.txt', 'w') as file:
  file.write('\n'.join(f"{commodity}: ${price}" for commodity, price in prices.items()))

#To save data to SQLite database
conn = sqlite3.connect('CommodityDatabase.db')
cursor = conn.cursor()
cursor.execute('CREATE TABLE CommodityTable (Ticker TEXT, Price REAL)')
cursor.executemany('INSERT INTO CommodityTable VALUES (?, ?)', prices.items())
conn.commit()
conn.close()

print("Data fetched and stored successfully.")

[*********************100%%**********************]  3 of 3 completed
Data fetched and stored successfully.


1. Connect to the Database:
  * Use sqlite3.connect() to connect to the SQLite database file 'CommodityDatabase.db'.
  * Create a cursor object (db_cursor) to interact with the database.
2. Get Column Information:

  * Execute PRAGMA table_info(CommodityTable) to fetch information about the columns in the CommodityTable.
  * Iterate through the retrieved column information.
  * Print the column name and its data type.
3. Get Column Names:
  * Execute PRAGMA table_info(CommodityTable) again to fetch the column names.
  * Create a list (column_names) containing the names of the columns.
4. Select and Print Data:
  * Execute SELECT * FROM CommodityTable to retrieve all data from the CommodityTable.
  * Fetch the rows of data.
  * Print the column names and the retrieved data.
5. Close Database Connection:
  * Close the database connection using the with statement.
6. Sample Output:
  * Print the column information, names, and the retrieved data.

In [None]:
import sqlite3

#To connect to the database file
with sqlite3.connect('CommodityDatabase.db') as db_conn:
  db_cursor = db_conn.cursor()

# To get column information
  db_cursor.execute('PRAGMA table_info(CommodityTable)')
  columns_info = db_cursor.fetchall()

  print("Column Information:")
  for col_info in columns_info:
    col_name, col_type = col_info[1], col_info[2]
    print(f"{col_name}: {col_type}")

  print("\n")

  #To get column names
  db_cursor.execute('PRAGMA table_info(CommodityTable)')
  column_names = [col[1] for col in db_cursor.fetchall()]

  # To execute a query to select all data from the CommodityTable
  db_cursor.execute('SELECT * FROM CommodityTable')
  rows = db_cursor.fetchall()

  # To print column names
  print(f"{column_names}\n")

  # To print the retrieved data
  for row in rows:
    print(row)

Column Information:
Ticker: TEXT
Price: REAL


['Ticker', 'Price']

('Crude Oil', 73.61)
('Gold', 2105.0)
('Silver', 25.89)
