# DS 3000 Lab 1

Due: Friday, Sep. 20 @ 11:59 pm

### Submission Instructions
Submit this `ipynb` file to Gradescope (this can also be done via the assignment on Canvas).  To ensure that your submitted `ipynb` file represents your latest code, make sure to give a fresh `Kernel > Restart & Run All` just before uploading the `ipynb` file to gradescope.

### Tips for success
- Collaborate: bounce ideas off of each other, if you are having trouble you can ask your classmates or Dr. Singhal for help with specific issues, however...
- Under no circumstances may one student view or share their ungraded homework or quiz with another student [(see also)](http://www.northeastern.edu/osccr/academic-integrity), i.e. you are welcome to **talk about** (*not* show each other your answers to) the problems.

In [1]:
# you may want to use the below modules on this lab
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import requests
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from bs4 import BeautifulSoup

# Part 1: Sketch and Begin Implementing a Pipeline

We wish to create a data frame that includes all the spells for each class (a "class" is something like a "wizard", or a "bard") in Dungeons and Dragons 5th Edition, which you can find [here](http://dnd5e.wikidot.com/). Your final data frame should look something like:

| Class     | Level     | Spell Name    | School      | Casting Time | Range                | Duration      | Components |
|----------:|----------:|--------------:|------------:|-------------:|---------------------:|--------------:|-----------:|
| Artificer | Cantrip   | Acid Splash   | Conjuration | 1 Action     | 60 Feet              | Instantaneous | V, S       |
| Artificer | Cantrip   | Booming Blade | Evocation   | 1 Action     | Self (5-foot radius) | 1 Round       | S, M       |
| ...       | ...       | ...           | ...         | ...          | ...                  | ...           | ...        |
| Wizard    | 9th Level | Wish          | Conjuration | 1 Action     | Self                 | Instantaneous | V          |

## Part 1.1: Poking Around (10 points)

Go to the D&D 5th Edition linked above. Scroll down to the "All Spells" list and click on "Artificer Spells", spend a moment looking around at the page, then "Bard Spells" to do the same, and make note of the url of each. What do you note about the pages and url that should be pretty convenient for scraping the data we are interested in for all different types of spells? Discuss anything else you might notice about the pages that may be either tricky or convenient to deal with. Note that in our desired data frame, we include the "Class" and "Level" for each spell.

I notice that all of the spells have specific columns and rows that give an apt description of what each spell does. This is pretty convenient because it can be used to create a dataset with proper rows and columns, which would be easy to manipulate and get certain data through code. Some tricky things to deal with is that there are commas in the components column, which may have to be stripped. Furthermore, the duration column has a combination of numbers and strings, which may need to be changed to all numbers for easier access depending on how it is used.

## Part 1.2: Sketch the Pipeline (20 points)

First, in a markdown cell, write a bullet point list of tasks we need to get the data frame we want. I'll give you what the first bullet point should be, and you fill in the rest (there may be only one more, depending on how efficient you are in describing the tasks...):

- Write a function that takes a class (string) as an argument and returns the tables from the class's DND wiki spell page in a dictionary for each spell level
- ... 

Then, in a code cell, define **empty** functions that correspond to the tasks you identified as needing done. For example, the function for the first bullet point above might start with:

```python
def get_class_spell_dict(dnd_class):
    """ takes a D&D class (string) and gets the spell tables and saves them in a dictionary
    
    Args:
        dnd_class (str): the D&D class
        
    Returns:
        table_dict (dict): a dictionary of tables, one for each spell level
    """
    pass
```

- create a function goes through every class and turns them into a dictionary and then turns these dictionaries into a dataframe by calling on the get_class_spell_dict each time, maybe use a for loop to get this done efficiently

In [2]:
def dictionaries_to_df():
    """Takes multiple dictionaries in a list form and turns them into a dataframe 
    
    Args: 
        takes a list of dnd_list_of_dicts, each dictionary being a class
        
    Returns:
        a dataframe of dictionaries, one for each class of spells
    """

## Part 1.3: Write the first function (30 points)

Go ahead and write the first function that I defined the framework for in part 4.2, and then test it by getting the dictionary of Bard spell tables. Show that it works by printing out the head of the 2nd level Bard spells. Your final calls should be something like:

```python
bard_spell_tables = get_class_spell_dict("bard")
bard_spell_tables['Level 2'].head()
```

**Note**: depending on how you create the dictionary in your `get_class_spell_dict` function, you may not have `'Level 2'` as the key; that's fine. The top of the table should look like:

| Spell Name       | School        | Casting Time | Range                | Duration      | Components |
|-----------------:|--------------:|-------------:|---------------------:|--------------:|-----------:|
| Aid              | Abjuration    | 1 Action     | 30 Feet              | 8 hours       | V, S, M    |
| Animal Messenger | Enchantment   | 1 Action R   | 30 Feet              | 24 hours      | V, S, M    | M

In [3]:
def get_class_spell_dict(dnd_class):
    """ takes a D&D class (string) and gets the spell tables and saves them in a dictionary
    
    Args:
        dnd_class (str): the D&D class
        
    Returns:
        table_dict (dict): a dictionary of tables, one for each spell level
    """
    new_dict = {}
    link = f"https://dnd5e.wikidot.com/spells:{dnd_class}"
    tables = pd.read_html(link)
    levels = ['Level 0', 'Level 1', 'Level 2', 'Level 3', 'Level 4', 'Level 5']
    for i in range(0,6):
        new_dict[levels[i]] = tables[i]
    return new_dict

bard_spell_tables = get_class_spell_dict("bard")
bard_spell_tables['Level 2'].head()

Unnamed: 0,Spell Name,School,Casting Time,Range,Duration,Components
0,Aid,Abjuration,1 Action,30 Feet,8 hours,"V, S, M"
1,Animal Messenger,Enchantment,1 Action R,30 Feet,24 hours,"V, S, M"
2,Blindness/Deafness,Necromancy,1 Action,30 Feet,1 minute,V
3,Borrowed Knowledge,Divination,1 Action,Self,1 hour,"V, S, M"
4,Calm Emotions,Enchantment,1 Action,60 feet,"Concentration, up to 1 minute","V, S"


# Part 2: Web Scraping EuroMillions Results

For this problem, we will begin to create a small data set scraped from [Euro-Millions](https://www.euro-millions.com/) which is a lottery that is played across nine European countries. Draws take place on Tuesday and Friday evenings with a minimum guaranteed jackpot of â‚¬17 million. **The rest of this problem is continued on Homework 2**.

## Part 2.1: The Scraper Function (20 points)

Complete the function `get_lottery_html()` below (including docstring) which visits the lottery results for a specific date and grabs the html. Visit [the website](https://www.euro-millions.com/results/) to select a date or two and notice the pattern in the url so that you can pass any date to the function as a string. 

**Make sure to remove the `pass` statement when you are finished**. I have written the code you should run once the function is completed, getting the lottery results from the last day in April.

In [4]:
def get_lottery_html(code):
    """gets the html of lottery results for a certain date
    
    Args:
        takes a string date in the format of YYYY-MM-DD (the lottery date)
        
    Returns:
        string HTML of the lottery page for that date
    
    """
    main_link = "https://www.euro-millions.com/results/"
    searching_link = f"https://www.euro-millions.com/results/{code}"
    result = requests.get(searching_link)
    return result.text

In [5]:
# when you are done the following code should be run
url_text = get_lottery_html('13-09-2024')
url_text # uncomment to see if it worked

'\r\n<!DOCTYPE html>\r\n<html lang="en">\r\n<head>\r\n\r\n\t<title>EuroMillions Results for Friday 13th September 2024 - Draw 1772</title>\r\n\t<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\r\n\t<meta name="description" content="View the EuroMillions results including prize breakdown, HotPicks numbers and Millionaire Maker codes for Friday 13th September 2024.">\r\n\t<meta name="keywords" content="euromillions results 13-09-2024, 13th september 2024, draw number 1772">\r\n\t<meta name="author" content="Euro-Millions.com">\r\n\t<meta name="format-detection" content="telephone=no">\r\n\t<meta name="HandheldFriendly" content="True">\r\n\t<meta name="viewport" content="width=device-width, initial-scale=1">\r\n\t\r\n\t<link rel="alternate" hreflang="x-default" href="https://www.euro-millions.com/results/13-09-2024">\r\n<link rel="alternate" hreflang="fr" href="https://www.euro-millions.com/fr/resultats/13-09-2024">\r\n<link rel="alternate" hreflang="de-AT" href="https:

## Part 2.2: The Soup Function (20 points)

Complete the function `get_country_soup()` below (including docstring) which takes the html from the previous function and outputs one of nine beautiful soup objects, depending on the country you are interested in as defined by the `'id'` attribute:

    - `id='PrizeAT'` (Austria)
    - `id='PrizeBE'` (Belgium)
    - `id='PrizeFR'` (France)
    - `id='PrizeIE'` (Ireland)
    - `id='PrizeLU'` (Luxembourg)
    - `id='PrizePT'` (Portugal)
    - `id='PrizeES'` (Spain)
    - `id='PrizeCH'` (Switzerland)
    - `id='PrizeGB'` (UK)
    
The function should take two arguments: the html object from `get_lottery_html()` and a string that specifies the `id` you are interested in (by default, Belgium or `BE`).
    
**Make sure to remove the `pass` statement when you are finished.** Then, also make sure to run the code to ensure your function works.

In [6]:
def get_country_soup(html, country):
    """Goes through the HTML to find the section for a certain country depending on the "id" attribute
    
    Args:
        takes the string html to go through
        takes a string country abbreviation to find the section
    
    Returns:
        a BeautifulSoup object that holds the specfic section that the country 'id' is associated to inside the HTML
    """
    soup_setup = BeautifulSoup(html)
    section = soup_setup.find_all(id=f'Prize{country}')
    return section

In [7]:
#when you are done the following code should be run (feel free to change the country if you wish)\n",
country_choice = 'BE'
my_country_soup = get_country_soup(url_text, country_choice)
my_country_soup #uncomment to see if it worked

[<div id="PrizeBE">
 <table class="mobFormat">
 <thead>
 <tr>
 <th>Numbers Matched</th>
 <th class="righty">Prize Per Winner</th>
 <th class="righty">Belgian Winners</th>
 <th class="righty">Prize Fund Amount</th>
 <th class="righty">Total Winners</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td class="colour" data-title="Numbers Matched">
 <span class="prizeName"><span class="ball">5 </span> + <span class="star"> 2</span></span>
 </td>
 <td class="righty" data-title="Prize Per Winner">
 											€31,511,704.35
 											
 										</td>
 <td class="righty" data-title="Belgian Winners">
 												0
 												
 											</td>
 <td class="righty" data-title="Prize Fund Amount">
 												€0.00
 												
 											</td>
 <td class="righty" data-title="Total Winners">
 <strong style="color:#F00">Rollover!</strong> 
 												0
 												
 										</td>
 </tr>
 <tr>
 <td class="colour" data-title="Numbers Matched">
 <span class="prizeName"><span class="ball">5 </span>