# COSC 426 / 526 - Assignment 02
### Discussed: Jan 31, 2025
### Due:  Feb 7, 2025 before 8AM ET
---
This notebook contains essential functions for your assignment. You will need to enhance and write additional code to complete the tasks. Please submit your completed work to the designated GitHub repository.


# Problem 1
This task involves processing files containing [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values). We will focus on two formats: [comma-separated values](https://en.wikipedia.org/wiki/Comma-separated_values) (CSV) and [tab-separated values](https://en.wikipedia.org/wiki/Tab-separated_values) (TSV).

## Problem 1a: Handling Comma-Separated Values (CSV)

A CSV file, as defined by [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values), is a plain text format used to store tabular data. Each line in a CSV file represents a data record, with individual fields separated by commas. The first line usually contains headers naming each column.

For the CSV processing part of this task, you are required to:

Count and display the number of data rows in the CSV file, excluding the header row.

Count and display the number of columns in the CSV file.

Calculate and display the average age from the values in the "age" column. Assume all ages in the file are integers, but compute the average as a floating-point number.

In [69]:
def parse_delimited_file(filename, delimiter=","):
    # Open and read in all lines of the file
    # (I do not recommend readlines for LARGE files)
    # `open`: ref [1]
    # `readlines`: ref [2]
    with open(filename, 'r', encoding='utf8') as dsvfile:
        lines = dsvfile.readlines()

    # Strip off the newline from the end of each line
    # Using list comprehension is the recommended pythonic way to iterate through lists
    # HINT: refs [3,4]
    lines = [line.rstrip() for line in lines]

    # Split each line based on the delimiter (which, in this case, is the comma)
    # HINT: ref [5]
    lines = [line.split(delimiter) for line in lines]
    
    # Separate the header from the data
    # HINT: ref [6]
    header = lines[0]
    lines = lines[1:]


    # Find "age" within the header
    # (i.e., calculating the column index for "age")
    # HINT: ref [7]
    age_index = header.index("age")

    # Calculate the number of data rows and columns
    # HINT: [8]
    num_data_rows = len(lines)
    num_data_cols = len(lines[0])
    
    # Sum the "age" values
    # HINT: ref [9]
    age_total = 0
    for line in lines:
        age_total += int(line[age_index])
        
    # Calculate the average age
    ave_age = age_total / num_data_rows
    
    # Print the results
    # `format`: ref [10]
    print("Number of rows of data: {}".format(num_data_rows))
    print("Number of cols: {}".format(num_data_cols))
    print("Average Age: {}".format(ave_age))
    
# Parse the provided csv file
parse_delimited_file('data.csv')

Number of rows of data: 8
Number of cols: 3
Average Age: 70.875


**Expected Ouput:**
```
Number of rows of data: 8
Number of cols: 3
Average Age: 70.875
```
**References:**
- [1: open](https://docs.python.org/3.6/library/functions.html#open)
- [2: readlines](https://docs.python.org/3.6/library/codecs.html#codecs.StreamReader.readlines)
- [3: list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)
- [4: rstrip](https://docs.python.org/3.6/library/stdtypes.html#str.rstrip)
- [5: split](https://docs.python.org/3.6/library/stdtypes.html#str.split)
- [6: splice](https://docs.python.org/3.6/glossary.html#term-slice)
- [7: "more on lists"](https://docs.python.org/3.6/tutorial/datastructures.html#more-on-lists)
- [8: len](https://docs.python.org/3.6/library/functions.html#len)
- [9: int](https://docs.python.org/3.6/library/functions.html#int)
- [10: format](https://docs.python.org/3.6/library/stdtypes.html#str.format)


### Problem 1b: Analyzing Tab-Separated Values (TSV)

Based on information from [Wikipedia](https://en.wikipedia.org/wiki/Tab-separated_values), a TSV file is a straightforward text format used for storing data in a table-like structure, such as in databases or spreadsheets. In such files, each line represents a data record, and individual fields within a record are separated by tab characters. This makes TSV a specific example of the wider category of delimiter-separated values formats.

For this task, you need to apply the same analysis as you did in the previous exercise, but this time using the provided tab-delimited file.

**Important Note:** The column arrangement in this new file differs from before. If your earlier approach involved hardcoding the position of the "age" column, you'll need to revise the `parse_delimited_file` function file that includes an "age" column, regardless of the column's order.

In [70]:
# Further reading on optional arguments, like "delimiter": http://www.diveintopython.net/power_of_introspection/optional_arguments.html
parse_delimited_file('data.tsv', delimiter="\t")

Number of rows of data: 8
Number of cols: 3
Average Age: 70.875


**Expected Ouput:**
```
Number of rows of data: 8
Number of cols: 3
Average Age: 70.875
```


# Problem 2: Converting Unicode to ASCII in Python

Upon examining the `data.csv` file, you might have noticed names containing non-English characters. These are encoded using [Unicode](https://en.wikipedia.org/wiki/Unicode), a comprehensive standard for representing a vast array of text characters and symbols. Python 3 [natively supports](https://docs.python.org/3/howto/unicode.html) offers built-in support for Unicode, but not all tools are compatible with it. Some require text in the [ASCII](https://en.wikipedia.org/wiki/ASCII) format.

Your task is to convert the Unicode-formatted names from the file into ASCII-formatted names and save these names into a new file named `data-ascii.txt`, placing each name on a separate line. To facilitate this conversion, use the provided [tranliteration dictionary](https://german.stackexchange.com/questions/4992/conversion-table-for-diacritics-e-g-%C3%BC-%E2%86%92-ue), which maps several common Unicode characters to their ASCII equivalents. Employ this dictionary to transform the Unicode strings into ASCII format.

In [71]:
translit_dict = {
    "ä" : "ae",
    "ö" : "oe",
    "ü" : "ue",
    "Ä" : "Ae",
    "Ö" : "Oe",
    "Ü" : "Ue", 
    "ł" : "l",
    "ō" : "o",
}

with open("data.csv", 'r', encoding='utf8') as csvfile:
    lines = csvfile.readlines()

# Strip off the newline from the end of each line
lines = [line.rstrip() for line in lines]
    
# Split each line based on the delimiter (which, in this case, is the comma) 
lines = [line.split(',') for line in lines]

# Separate the header from the data
header = lines[0]
lines = lines[1:]
    
# Find "name" within the header
name_index = header.index("name")

# Extract the names from the rows
unicode_names = []
for line in lines:
    unicode_names.append(line[name_index])


# Iterate over the names
translit_names = []
for unicode_name in unicode_names:
    # Perform the replacements in the translit_dict
    # HINT: ref [1]
    for key, value in translit_dict.items():
        unicode_name = unicode_name.replace(key, value)
    
    translit_names.append(unicode_name)


# Write out the names to a file named "data-ascii.txt"
# HINT: ref [2]
with open("data-ascii.txt", 'w', encoding='utf8') as f:
    for t in translit_names:
        f.write(t + "\n")

# Verify that the names were converted and written out correctly
with open("data-ascii.txt", 'r') as infile:
    for line in infile:
        print(line.rstrip())

Richard Phillips Feynman
Shin'ichiro Tomonaga
Julian Schwinger
Rudolf Ludwig Moessbauer
Erwin Schroedinger
Paul Dirac
Maria Sklodowska-Curie
Pierre Curie


**Expected Output:**
```
Richard Phillips Feynman
Shin'ichiro Tomonaga
Julian Schwinger
Rudolf Ludwig Moessbauer
Erwin Schroedinger
Paul Dirac
Maria Sklodowska-Curie
Pierre Curie
```

**References:**
- [1: replace](https://docs.python.org/3.6/library/stdtypes.html#str.replace)
- [2: file object methods](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects)

## Problem 3. Managing files in Dataverse -- Use case

Conduct a search to find a comprehensive list of all Nobel Prizes awarded since 1901.
- Source: https://www.nobelprize.org/organization/api-examples/


Create a CSV (Comma-Separated Values) or TSV (Tab-Separated Values) file containing this data.

Use this data to create a dataset in Dataverse, ensuring all relevant metadata is accurately included.

In this Jupyter Notebook, create a series of cells (as many as necessary) to perform the following tasks:

Download the file from Dataverse.

Display the contents of the dataset.

Compute basic statistics on the dataset, including:

a) Count the number of Nobel Prizes awarded in each of the following categories: Chemistry, Economics, Literature, Peace, Physics, and Physiology or Medicine.

b) Identify the instances where Nobel Prizes were not awarded in these categories, along with the specific years when this occurred.

c) Determine how many times the Nobel Peace Prize was shared between two individuals.

d) Calculate the number of times the Nobel Prize in Physiology or Medicine was shared among three individuals.

Remember to comment your code thoroughly. Use markdown cells to explain and contextualize your work throughout the code.

### Step 1: Gather Nobel Prize data as CSV
We need to first gather our data. All the Nobel Prize data can be found https://api.nobelprize.org/2.0/nobelPrizes, but we need to write code to properly put it in a CSV for further analysis. Use their API (https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.0#/default/get_nobelPrizes) to successfully gather all entries, especially information about the different categories, years, and laureates.

In [72]:
## Add your code here. 
## Create a series of cells (as many as necessary)
## Remember to comment your code thoroughly. 
## Intersperse markdown cells to explain and contextualize your work throughout the notebook.

import requests
import csv

def get_all_nobel_prizes(offset=0, limit=100):
    base_url = "https://api.nobelprize.org/2.0/nobelPrizes"
    results = []
    
    while True:
        response = requests.get(f"{base_url}?offset={offset}&limit={limit}")
        
        if response.status_code == 200:
            data = response.json()
            prizes = data.get("nobelPrizes", [])
            
            #Stop when no more data is returned
            if not prizes:
                break  
            
            #Store each person's entry in list "results"
            for prize in prizes:
                award_year = prize.get("awardYear")
                category = prize.get("category", {}).get("en")
                category_full_name = prize.get("categoryFullName", {}).get("en")
                date_awarded = prize.get("dateAwarded")
                prize_amount = prize.get("prizeAmount")
                prize_amount_adjusted = prize.get("prizeAmountAdjusted")
                links = prize.get("links", {}).get("href")
                
                for laureate in prize.get("laureates", []):
                    results.append([
                        award_year,
                        category,
                        category_full_name,
                        date_awarded,
                        prize_amount,
                        prize_amount_adjusted,
                        links,
                        laureate.get("id"),
                        laureate.get("knownName", {}).get("en"),
                        laureate.get("sortOrder")
                    ])
            
            #Request next batch of results from API
            offset += limit  
        else:
            print("Failed to retrieve data")
            break
    return results



data = get_all_nobel_prizes()
with open("all_nobel_prizes.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["awardYear", "category", "categoryFullName", "dateAwarded", "prizeAmount", "prizeAmountAdjusted", "linksHref", "laureateID", "laureateName", "laureateSortOrder"])
    writer.writerows(data)

print("Data saved to all_nobel_prizes.csv")

Data saved to all_nobel_prizes.csv


In [73]:
#Just to check the contents of the csv
import pandas as pd
df = pd.read_csv("all_nobel_prizes.csv")
display(df)

Unnamed: 0,awardYear,category,categoryFullName,dateAwarded,prizeAmount,prizeAmountAdjusted,linksHref,laureateID,laureateName,laureateSortOrder
0,1901,Chemistry,The Nobel Prize in Chemistry,1901-11-12,150782,9704878,https://api.nobelprize.org/2/nobelPrize/che/1901,160,Jacobus H. van 't Hoff,1
1,1901,Literature,The Nobel Prize in Literature,1901-11-14,150782,9704878,https://api.nobelprize.org/2/nobelPrize/lit/1901,569,Sully Prudhomme,1
2,1901,Peace,The Nobel Peace Prize,1901-12-10,150782,9704878,https://api.nobelprize.org/2/nobelPrize/pea/1901,462,Henry Dunant,1
3,1901,Peace,The Nobel Peace Prize,1901-12-10,150782,9704878,https://api.nobelprize.org/2/nobelPrize/pea/1901,463,Frédéric Passy,2
4,1901,Physics,The Nobel Prize in Physics,1901-11-12,150782,9704878,https://api.nobelprize.org/2/nobelPrize/phy/1901,1,Wilhelm Conrad Röntgen,1
...,...,...,...,...,...,...,...,...,...,...
1007,2024,Peace,The Nobel Peace Prize,2024-10-11,11000000,11000000,https://api.nobelprize.org/2/nobelPrize/pea/2024,1043,,1
1008,2024,Physics,The Nobel Prize in Physics,2024-10-08,11000000,11000000,https://api.nobelprize.org/2/nobelPrize/phy/2024,1037,John J. Hopfield,1
1009,2024,Physics,The Nobel Prize in Physics,2024-10-08,11000000,11000000,https://api.nobelprize.org/2/nobelPrize/phy/2024,1038,Geoffrey Hinton,2
1010,2024,Physiology or Medicine,The Nobel Prize in Physiology or Medicine,2024-10-07,11000000,11000000,https://api.nobelprize.org/2/nobelPrize/med/2024,1035,Victor Ambros,1


You should get 1012 entries. According to the official website (https://www.nobelprize.org/prizes/lists/all-nobel-prizes/), the Nobel Prizes were "awarded 627 times to 1012 people and organisations." 

### Step 2: Put data on Dataverse
The Nobel Peace prize CSV file is uploaded to Harvard Dataverse and shared publicly. The link is https://doi.org/10.7910/DVN/QCHRYN. 

### Step 3: Analysis
Let's answer the following questions using the CSV file.

a) Count the number of Nobel Prizes awarded in each of the following categories: Chemistry, Economics, Literature, Peace, Physics, and Physiology or Medicine.

In [81]:
#Count the number of occurrences of a specific year and category
#Reframes df to count each category and year pair as one award
category_counts = df.groupby(["awardYear", "category"]).size().reset_index(name="num_prizes")
category_counts = category_counts.groupby("category").size().reset_index(name="num_prizes")
print(category_counts)

                 category  num_prizes
0               Chemistry         116
1       Economic Sciences          56
2              Literature         117
3                   Peace         105
4                 Physics         118
5  Physiology or Medicine         115


b) Identify the instances where Nobel Prizes were not awarded in these categories, along with the specific years when this occurred.

In [83]:
missing_awards = {}
all_years = set(range(1901, 2025))


for category in df["category"].unique():
    missing_awards[category] = []
    awarded_years = set(df[df["category"] == category]["awardYear"].unique())
    not_awarded_years = all_years - awarded_years
    
    #Add all missing years to the category
    for year in not_awarded_years:
        missing_awards[category].append(int(year))

    missing_awards[category].sort()

sorted_missing_awards = dict(sorted(missing_awards.items()))

print("Missing year per category:")
for key, value in sorted_missing_awards.items():
    print(f'{key}: {value}')

Missing year per category:
Chemistry: [1916, 1917, 1919, 1924, 1933, 1940, 1941, 1942]
Economic Sciences: [1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968]
Literature: [1914, 1918, 1935, 1940, 1941, 1942, 1943]
Peace: [1914, 1915, 1916, 1918, 1923, 1924, 1928, 1932, 1939, 1940, 1941, 1942, 1943, 1948, 1955, 1956, 1966, 1967, 1972]
Physics: [1916, 1931, 1934, 1940, 1941, 1942]
Physiology or Medicine: [1915, 1916, 1917, 1918, 1921, 1925, 1940, 1941, 1942]


c) Determine how many times the Nobel Peace Prize was shared between two individuals.

In [84]:
#Filter df by "Peace" category and reframe to have counts of specific year and category
#Duplicate year and category pairs means multiple individuals
peace_prize_shared = df[(df["category"] == "Peace")].groupby(["awardYear", "category"]).size()
two_winner_count = peace_prize_shared[peace_prize_shared == 2].count()
print("Number of times the Nobel Peace Prize was shared between 2 individuals: ", two_winner_count)

Number of times the Nobel Peace Prize was shared between 2 individuals:  31


d) Calculate the number of times the Nobel Prize in Physiology or Medicine was shared among three individuals.

In [85]:
#Filter df by "Physiology or Medicine" category and reframe to have counts of specific year and category
#Duplicate year and category pairs means multiple individuals
medicine_prize_shared = df[(df["category"] == "Physiology or Medicine")].groupby(["awardYear", "category"]).size()
three_winner_count = medicine_prize_shared[medicine_prize_shared == 3].count()
print("Number of times the Nobel Prize in Physiology or Medicine was shared among 3 individuals: ", three_winner_count)

Number of times the Nobel Prize in Physiology or Medicine was shared among 3 individuals:  39


## Problem 4. Write the comprehensive README files for Problem 3

**Note:** These directions are for a README file for your assignments. An extensive README file should be used for your project. 

***Write the comprehensive README files for Assginemnt 1***

A comprehensive README file on GitHub is the primary information source for anyone exploring your repository. It is essential for clearly conveying your assignment's purpose, setup, and usage.

Key elements of a comprehensive README for an assignment include:

Assignment title: This should clearly state the name of your project.

Assignment description: Provide a concise overview of what the project entails. This section should explain the project's usefulness and the problems it addresses.

Installation instructions: Offer detailed steps for setting up the project. This includes any prerequisites, dependencies, and a step-by-step guide to operationalizing the project.

Use: Give clear instructions on how to use the project. Enhance this section with practical examples, including code snippets, screenshots, or videos.

Contact information: Detail how to contact you. This could be through email.

Acknowledgments: Credit any individuals, organizations, or other entities contributing significantly to the assignment.
Use APA citation style.

**Add the README file to the GitHub repository with the solution of Problems 3.**

# Free-Form Questions:

Q1. Given that your solutions for Problems 1 and 2 likely have similar code components, you may have found yourself copying and pasting code from Problem 1 to Problem 2. To streamline this, consider refactoring the `parse_delimited_file` function so that it can be effectively utilized in both problems. Write the code below.

In [78]:
translit_dict = {
    "ä" : "ae",
    "ö" : "oe",
    "ü" : "ue",
    "Ä" : "Ae",
    "Ö" : "Oe",
    "Ü" : "Ue", 
    "ł" : "l",
    "ō" : "o",
}

def parse_delimited_file(filename, col_filter, delimiter=","):
    # Open and read in all lines of the file
    with open(filename, 'r', encoding='utf8') as dsvfile:
        lines = dsvfile.readlines()

    # Strip off the newline from the end of each line
    lines = [line.rstrip() for line in lines]

    # Split each line based on the delimiter (which, in this case, is the comma)
    lines = [line.split(delimiter) for line in lines]
    
    # Separate the header from the data
    header = lines[0]
    lines = lines[1:]

    # Find whatever column filter ("age", "name") within the header
    index = header.index(col_filter)

    # Calculate the number of data rows and columns
    num_data_rows = len(lines)
    num_data_cols = len(lines[0])
    
    #For Problem 1
    if col_filter == "age":
      # Sum the "age" values
      age_total = 0
      for line in lines:
          age_total += int(line[index])
          
      # Calculate the average age
      ave_age = age_total / num_data_rows
    
      # Print the results
      print("Number of rows of data: {}".format(num_data_rows))
      print("Number of cols: {}".format(num_data_cols))
      print("Average Age: {}".format(ave_age))
    

    #For Problem 2
    elif col_filter == "name":
        unicode_names = []
        for line in lines:
            unicode_names.append(line[index])
        

        translit_names = []
        for unicode_name in unicode_names:
            # Perform the replacements in the translit_dict
            for key, value in translit_dict.items():
                unicode_name = unicode_name.replace(key, value)
            
            translit_names.append(unicode_name)

        with open("data-ascii.txt", 'w', encoding='utf8') as f:
            for t in translit_names:
                f.write(t + "\n")

        with open("data-ascii.txt", 'r') as infile:
            for line in infile:
                print(line.rstrip())


#Problem 1
parse_delimited_file('data.csv', "age")

print()

#Problem 2
parse_delimited_file("data.csv", "name", delimiter=",")

Number of rows of data: 8
Number of cols: 3
Average Age: 70.875

Richard Phillips Feynman
Shin'ichiro Tomonaga
Julian Schwinger
Rudolf Ludwig Moessbauer
Erwin Schroedinger
Paul Dirac
Maria Sklodowska-Curie
Pierre Curie


Q2. Investigate whether there are any pre-existing Python packages that could assist in solving Problems 1 and 2. If such packages are available, modify your solutions to incorporate them, enhancing efficiency and possibly reducing the amount of custom code required.

In [79]:
import pandas as pd
from unidecode import unidecode

def parse_delimited_file(filename, col_filter, delimiter=","):
    df = pd.read_csv(filename, delimiter=delimiter)

    # Problem 1
    if col_filter == "age":
        ave_age = df["age"].mean()
        print(f"Number of rows of data: {len(df)}")
        print(f"Number of cols: {df.shape[1]}")
        print(f"Average Age: {ave_age}")

    # Problem 2
    elif col_filter == "name":
        df["name"] = df["name"].apply(unidecode)  #Convert Unicode to ASCII
        df["name"].to_csv("data-ascii.txt", index=False, header=False, encoding='utf8')

        # Print transliterated names
        with open("data-ascii.txt", 'r', encoding='utf-8') as infile:
            for line in infile:
                print(line.rstrip())

#Problem 1
parse_delimited_file('data.csv', "age")
print()
#Problem 2
parse_delimited_file("data.csv", "name", delimiter=",")

Number of rows of data: 8
Number of cols: 3
Average Age: 70.875

Richard Phillips Feynman
Shin'ichiro Tomonaga
Julian Schwinger
Rudolf Ludwig Mossbauer
Erwin Schrodinger
Paul Dirac
Maria Sklodowska-Curie
Pierre Curie


- Pandas - easy to read csv (https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
- Unidecode - convert unicode to ASCII instead of manually doing it (https://pypi.org/project/Unidecode/)

# Live Chat: What we learned from 5 million books

After watching the talk of Jean-Baptiste Michel and Erez Lieberman Aiden who told us about “What we learned from 5 million books”
https://www.ted.com/talks/jean_baptiste_michel_erez_lieberman_aiden_what_we_learned_from_5_million_books

Answer these questions related to the talk:

- What is the take-away of this talk? Summarize it in up to 3 sentences.

- What are metadata?

- What is a n-gram?

- What is the suppression index? 

- What is culturomics? 


Google digitized 5 million books and created Google Labs' Ngram Viewer, allowing researchers to compute statistics about books. Michel and Aiden were able to find cultural trends across centuries from word usage to censorship. More historical records are being digitized and the results will transform our understanding of language, history, and culture.

- Metadata - Information about one or more aspects of the data like where it's published.
- N-gram - A sequence of n words. In this context, they help measure of cultural trends.
- Supression index - Victims of suppression from 0 to 100. Talked about more/less than they should be.
- Culturomics - Application of massive scale data collection and analysis to the study of human culture.

## Your reflect on this lecture and assignment:

Q1 **Resource Utilization in Problem 1:**

How many different external resources (such as websites or books) did you consult while working on Problem 1? Please list these resources.

Q2 **Tools and Packages Used in Problem 2:**

Can you quantify the number of external tools or Python packages you utilized in Problem 2? Please list these tools or packages.

Q3 **Debugging in Problem 1:**

How many times did you encounter and resolve errors or bugs in Problem 1? Detail each instance.

Q4 **Debugging in Problem 2:**

What was the total number of debugging instances in Problem 2? Please describe each instance.

Q5 **Learning and Insights:**

What are the key lessons or insights you gained from solving these problems? How do you plan to apply these insights to future coding projects or problem-solving situations?

Q6 **Collaborative Experiences:**

If you worked with others, describe how collaboration affected your approach to the problems. Could you provide an example of how you assisted a peer or how a peer's advice was beneficial to you?

**Important Note:** Your responses to these questions are as critical as your solutions for Problems 1 and 2. Brief answers or responses limited to just a few words will be considered inadequate and will negatively impact the overall grading of your assignment.

Write here your answers, enumarating them.

1.  The only resources I used for problm 1 were the ones listed in the provided references. The links in the references all lead to docs.python.org.

2. The number of external tools and Python packages I used in the original problem is zero. For the free-form question about shortening the function, I used two packages: pandas and unidecode.

3. All of my debugging happened in problem 1a since 1b uses the same function without any changes. I tried to code without directly looking up the references so my bug fixes were all silly mistakes. For example, I forgot the "age" values are strings, so I forgot to convert them to integers. The other silly mistake was using split instead of rstrip. I did look at the references and miscounted, so I thought I had to use split to remove the new lines.

4. The debugging was even less than problem 1 since I could copy most of my code from problem 1's solution. The only silly bug was incorrectly writing the for loop for the translit_dict to replace the unicode with ASCII. I forgot that the dictionary needs .items() at the end to successfully access the key and value. 

5. Overall, these problems were not difficult. They were a nice refresher for built-in Python functions. My key lesson is that I should just read the documentation when I forget syntax. I try to brute force myself to remember how a function call is written due to my pride, but next time I should just bite the bullet and look it up. Documentation exists for a reason. Also, I'm so used to processing csvs with pandas, so it was a good exercise to do it with built-in Python functions. Staying fresh with rstrip(), index(), etc could come in handy in future coding projects.


6. I did not work with others, but I would imagine it would be during the data collection for problem 3 if I did. I'm not the best at using REST API, so I can see struggling to write the calls to get all the data I need. I tend to do things less efficiently but simpler to do, but sometimes it's better to put in some more work to save time overall.