# Web Mining and Applied NLP (44-620)

## Python Notebooks, Basics, and Data Structures

### Student Name: Jarred Gastreich
https://github.com/jarjarredred/python-ds-nb

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

Do not use external modules (`math`, etc) for this assignment unless you are explicitly instructed to, though you may use built in python functions (`min`, `max`, etc) as you wish.

1. Modify the Markdown cell above to put your name after "Student Name:"; you will be expected to do this in all assignments presented in this format for this class.

2. Write code that divides any two numbers, stores the result in a variable, and prints the result with an appropriate label.

In [129]:
num1 = 10
num2 = 2
division_result = (num1 / num2) if num2 != 0 else "undefined (division by zero)"
print(f"The result of dividing {num1} by {num2} is: {division_result}")

The result of dividing 10 by 2 is: 5.0


3. Using loops (and potentially conditionals), write Python code that prints the factorial of each integer from 1 through 10 (which you can store in a variable if you want). The factorial of an integer is the product of all of the integers of 1 through the number. Print the result with an appropriate label.

In [130]:
def calculate_factorial(n):
    if not isinstance(n, int) or n < 0:
        print(f"Warning: Factorial is not defined for non-negative integers. Received: {n}")
        return None
    elif n == 0 or n == 1:
        return 1
    else:
        factorial = 1
        # Loop from 1 up to n (inclusive) to calculate the product
        for i in range(1, n + 1):
            factorial *= i
        return factorial

print("Calculating factorials for integers from 1 to 10:\n")

# Loop through numbers from 1 to 10
for number in range(1, 11):
    factorial_result = calculate_factorial(number)
    if factorial_result is not None: # Check if the factorial was successfully calculated
        print(f"The factorial of {number} is: {factorial_result}")


Calculating factorials for integers from 1 to 10:

The factorial of 1 is: 1
The factorial of 2 is: 2
The factorial of 3 is: 6
The factorial of 4 is: 24
The factorial of 5 is: 120
The factorial of 6 is: 720
The factorial of 7 is: 5040
The factorial of 8 is: 40320
The factorial of 9 is: 362880
The factorial of 10 is: 3628800


4. Write a python function that takes a single parameter and calculates and returns the average (mean) of the values in the parameter (which you may assume is iterable).  Show that your function works by printing the result of calling the function on the list in the cell below.

In [131]:
def calculate_average(data):
    if not data:
        print("Warning: The input iterable is empty. Cannot calculate average.")
        return None

    total = 0
    count = 0
    for item in data:
        if isinstance(item, (int, float)):
            total += item
            count += 1
        else:
            print(f"Warning: Non-numeric value '{item}' found in data. Skipping.")

    if count == 0:
        print("Warning: No numeric values found in the iterable. Cannot calculate average.")
        return None
    else:
        return total / count
    

In [132]:
testlist = [1,-1,2,-2,3,-3,4,-4]

average_testlist = calculate_average(testlist)
print(f"The average of the list {testlist} is: {average_testlist}")

The average of the list [1, -1, 2, -2, 3, -3, 4, -4] is: 0.0


5. Using your mean function above, write a function that calculates the variance of the list of numbers (see https://en.wikipedia.org/wiki/Variance for more information on the formula). In short:
* subtract the mean of the elements in the list from every element in the list; store these values in a new list
* square every element in the new list and sum the elements together
* divide the resulting number by N (where N is the length of the original list)

Show the result of calling your function in the lists in the code cell. You must use one or more list comprehensions or map/filter in your code.


In [133]:
def calculate_variance(data):
    if not data:
        print("Warning: The input list is empty. Cannot calculate variance.")
        return None

    mean = calculate_average(data)
    if mean is None:
        print("Error: Could not calculate mean, cannot calculate variance.")
        return None

    differences_from_mean = []
    numeric_count = 0
    for item in data:
        if isinstance(item, (int, float)):
            differences_from_mean.append(item - mean)
            numeric_count += 1
        else:
            print(f"Warning: Non-numeric value '{item}' found in data for variance calculation. Skipping.")

    if numeric_count == 0:
        print("Warning: No numeric values found for variance calculation.")
        return None

    squared_differences_sum = 0
    for diff in differences_from_mean:
        squared_differences_sum += diff ** 2

    variance = squared_differences_sum / numeric_count
    return variance

In [134]:
list1 = [ 5.670e-1, -1.480e+0, -5.570e-1, -1.470e+0, 7.340e-1, 1.050e+0, 4.480e-1, 2.570e-1, -1.970e+0, -1.460e+0]
list2 = [-1.780e+0, 2.640e-1, 1.160e+0, 9.080e-1, 1.780e+0, 1.080e+0, 1.050e+0, -4.630e-2, 1.520e+0, 5.350e-1]
# the variances of both lists should be relatively close to 1 (off by less than .15)

variance_list1 = calculate_variance(list1)
print(f"The variance of list1 {list1} is: {variance_list1}")

variance_list2 = calculate_variance(list2)
print(f"The variance of list2 {list2} is: {variance_list2}")

The variance of list1 [0.567, -1.48, -0.557, -1.47, 0.734, 1.05, 0.448, 0.257, -1.97, -1.46] is: 1.13973309
The variance of list2 [-1.78, 0.264, 1.16, 0.908, 1.78, 1.08, 1.05, -0.0463, 1.52, 0.535] is: 0.9257232841


6. Create a list with at least 15 elements in it. Use list slicing to print the following:
* The first 5 elements of the list
* The last 5 elements of the list
* The list reversed (hint, show the entire list with a stride of -1)
* Every second element in the list
* Every third element in the list (stride of 3)

In [135]:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

print("Original list:", my_list)

# The first 5 elements of the list
print("First 5 elements:", my_list[:5])

# The last 5 elements of the list
print("Last 5 elements:", my_list[-5:])

# The list reversed
print("List reversed:", my_list[::-1])

# Every second element in the list
print("Every second element:", my_list[::2])

# Every third element in the list
print("Every third element:", my_list[::3])

Original list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
First 5 elements: [1, 2, 3, 4, 5]
Last 5 elements: [16, 17, 18, 19, 20]
List reversed: [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Every second element: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
Every third element: [1, 4, 7, 10, 13, 16, 19]


7. Build a dictionary that contains the following information about this class (with appropriate names as keys):
* The name
* The course number
* The semester/term in which you are taking this course
* The number of credit hours this course counts for
* A list of the course learning objectives

The majority of this information can be found in the syllabus. Print the dictionary.

In [136]:
class_info = {
    "name": "Web Mining & Applied Natural Language Processing (NLP)",
    "course_number": "44-620 80/81",
    "semester_term": "Summer 2025 - Block 2",
    "credit_hours": "3 credits",
    "learning_objectives": [
        "L01. Manage Python libraries and packages.",
        "L02. Interact with Hosted Version Control Systems (e.g. Git and GitHub).",
        "L03. Programmatically obtain and transform data from web-based APIs and HTML pages into a usable form.",
        "L04. Describe the steps in a basic Natural Language Processing Pipeline.",
        "L05. Use preexisting tools and software libraries to perform some Natural Language Processing, such as sentiment analysis.",
        "L06. Explain results and conclusions drawn from the visualized."
    ]
}

print(class_info)

{'name': 'Web Mining & Applied Natural Language Processing (NLP)', 'course_number': '44-620 80/81', 'semester_term': 'Summer 2025 - Block 2', 'credit_hours': '3 credits', 'learning_objectives': ['L01. Manage Python libraries and packages.', 'L02. Interact with Hosted Version Control Systems (e.g. Git and GitHub).', 'L03. Programmatically obtain and transform data from web-based APIs and HTML pages into a usable form.', 'L04. Describe the steps in a basic Natural Language Processing Pipeline.', 'L05. Use preexisting tools and software libraries to perform some Natural Language Processing, such as sentiment analysis.', 'L06. Explain results and conclusions drawn from the visualized.']}


8.  Given the dictionary defined in the code cell below, print the list of level 3 spells the character has.

In [137]:
player_character = {'name': 'Kitab',
                   'class': [('Cleric: Knowledge', 7)],
                   'spells': {'cantrip': ['Guidance', 'Light', 'Thaumaturgy', 'Toll the Dead', 'Word of Radiance'],
                             'level 1': ['Command', 'Detect Magic', 'Healing Word', 'Identify', 'Sleep'],
                             'level 2': ['Augury', 'Calm Emotions', 'Command', 'Invisibility', 'Lesser Restoration'],
                             'level 3': ['Mass Healing Word', 'Nondetection', 'Revivify', 'Feign Death', 'Speak with Dead'],
                             'level 4': ['Banishment', 'Confusion']}
                   }

def print_level_3_spells(player_character):
    print("Level 3 Spells:")
    for spell in player_character['spells']['level 3']:
      print(f"- {spell}")

print_level_3_spells(player_character)

Level 3 Spells:
- Mass Healing Word
- Nondetection
- Revivify
- Feign Death
- Speak with Dead


9. Write code to determine the number of unique elements in the list below.  You MUST use a set in finding your solution.  Print the number of unique values in the list with an appropriate label.

In [138]:
values = [10, 11, 10, 8, 1, 12, 0, 1, 6, 5, 5, 13, 6, 15, 0, 0, 1, 1, 9, 7]

unique_values = set(values)

num_unique_elements = len(unique_values)

print(f"The number of unique values in the list is: {num_unique_elements}")

The number of unique values in the list is: 12


10. Create a new Jupyter Notebook (the name of the notebook should be your S number). Add a Markdown cell that contains your name. Add a Code cell and write Python that uses loops to draw the following pattern:

```
*      *
**    **
***  ***
********
```
Make sure to add and submit both the new notebook and the changes to this notebook for this assignment.