# PROGRES - TME2

Fabien Mathieu - fabien.mathieu@normalesup.org

Sébastien Tixeuil - Sebastien.Tixeuil@lip6.fr

**Binôme : Ahcene LOUBAR, Omar REZKELLAH**

**Note**: 
- Star exercises (indicated by *) should only be done if all other exercises have been completed. You 
don't have to do them if you do not want.

# Rules

1. Cite your sources
2. One file to rule them all
3. Explain
4. Execute your code


https://github.com/balouf/progres/blob/main/rules.ipynb

# Exercice 1 - Regular Expressions

Consider the following list:

In [266]:
L = ['marie.Dupond@gmail.com', 'lucie.Durand@wanadoo.fr', 
'Sophie.Parmentier @@ gmail.com', 'franck.Dupres.gmail.com', 
'pierre.Martin@lip6 .fr ',' eric.Deschamps@gmail.com '] 

- Which of these entries are valid?
- Use regular expressions to identify valid *gmail* addresses and display them. 

Answer

The valid entries are ```marie.Dupond@gmail.com```, ```eric.Deschamps@gmail.com``` as they were the only ones that the function true_gmail returned. The function takes a list of strings representing supossedly email adresses, but not all of them are valid, so to make sure we identify the valid ones we'll use the library re (RegEx) to try and match a pattern to each one of them and see which ones are valid. 

1. The function takes a mail_list: a list containing strings that supposedly represent e-mail adresses
2. makes sure that all strings in the list are striped from spaces at the start and at the end. 
3. Defines a pattern: 
* ```[a-zA-Z0-9._%+-]+``` : represents the local part of an e-mail address and it can contain all the uppercase and lowercase letters of the alphabet + digits from 0 to 9 and some special characters.
* ```@gmail``` : represents an @ followed by gmail simply.
* ```\.[a-zA-Z]{2,}``` : represents a dot followed by a minimum of two elements that can be all the uppercase and lowercase letters of the alphabet.
4. Uses the fullmatch function of the re module to check if each email string present on the emails list matches the pattern.
5. Returns the list of valid emails.

In [267]:
import re 
def true_gmail(mail_list):
    mail_list = [mail.strip() for mail in mail_list]
    pattern_email = r'[a-zA-Z0-9._%+-]+@gmail\.[a-zA-Z]{2,}'
    valid_emails = [email for email in mail_list if re.fullmatch(pattern_email, email)]
    return valid_emails


In [268]:
true_gmail(L)

['marie.Dupond@gmail.com', 'eric.Deschamps@gmail.com']

I heard someone talk about the issue below before, so i guess i have to find a way 

In [269]:
true_gmail([".@gmail.com"])

['.@gmail.com']

A solution to this would be to put the ```pattern = r'[a-zA-Z0-9_%+-][a-zA-Z0-9._%+-]+[a-zA-Z0-9_%+-]@gmail\.[a-zA-Z]{2,}'```  would ensure that email strings don't start or end with a dot even tho this is still a bit clumsy i guess. 
I got this information from a fellow student. 

In [270]:
print(re.fullmatch(r'[a-zA-Z0-9_%+-][a-zA-Z0-9._%+-]+[a-zA-Z0-9_%+-]@gmail\.[a-zA-Z]{2,}', ".@gmail.com")) 

None


- Use regular expressions to check if a string ends with a number. 

We want to detect if a string ends with a number so we'll just directly use re.search() and a pattern that matches at least one digit at the end of the string:
* ```\d+``` : at least one digit
* ```$``` : at the end of the string 

In [272]:
def ends_with_number(txt):
    return bool(re.search(r'\d+$', txt))

In [273]:
ends_with_number('to42to')

False

In [274]:
ends_with_number('to42to666')

True

- Use regular expressions to remove problematic zeros from an IPv4 address expressed as a 
string. (example: "216.08.094.196" should become "216.8.94.196", but "216.80.140.196" 
should remain "216.80.140.196"). 

The point of the normalize_ip() function is to remove leading zeros from complete octets (digits that are delimited by dots) and for that we have to, at first, make a pattern that only cares about complete octets (or words of digits) and for that we can use the special sequence "\b", because it only matches a word that starts with a word character and is delimited at the end by a non-word character such as a dot!

we are interested only in the octets that start with one or two zeros, and then make sure we replace the whole octet with what comes after the zero(s):

* ```\b0+(\d)``` : this unsures we match octets that start with at least one zero followed by a group of digits delimited by a dot.
* ```r'\1'```: refers to the first capture group from the pattern (\d) the actual digit after the leading zero.
* ```re.sub()``` : we use this function to replace octets that starts with zeros by the digits that come right after the zeros.

I used this link https://ttl255.com/regex-and-unix-tools-for-networking-basics/

I also asked gemini about the "\b" special sequence, the response was : "\b est un délimiteur de mot (appelé "word boundary" en anglais). Il ne correspond à aucun caractère particulier, mais marque une position entre : Un caractère "de mot" (comme une lettre, un chiffre ou un underscore _) et un caractère "non-mot" (comme un espace, un point, une virgule, etc.)."

In [275]:
def normalize_ip(txt):
    return re.sub(r'\b0+(\d)', r'\1', txt)

In [276]:
normalize_ip("216.0.094.196")

'216.0.94.196'

In [277]:
normalize_ip("216.08.094.196")

'216.8.94.196'

In [278]:
normalize_ip("216.80.140.196")

'216.80.140.196'

- Use regular expressions to transform a date from MM-DD-YYYY format to DD-MM-YYYY 
format. (example "11-06-2020" should become "06-11-2020"). Optionally*, do the same thing using the `datetime` package.

All we had to do is caplture three groups for day, month and year and then rearrange them by 2nd group with the first one and vice versa. 
* ```(\d{2})-(\d{2})-(\d{4})``` : captures a structure that has the MM-DD-YYYY shape 
* ```re.sub()``` : used to replace (or rearrange in our case)
* ```r'\2-\1-\3'``` : rearranging the order of the first two groups (replacing the first with the second and the second with the first)

In [279]:
def switch_md(txt):
    pattern = r'(\d{2})-(\d{2})-(\d{4})'
    return re.sub(pattern, r'\2-\1-\3', txt)

In [280]:
switch_md("11-06-2020")

'06-11-2020'

# Exercice 2 - Analyze XML

- Write a Python code that retrieves the content of the page at:

In [48]:
import xml.etree.ElementTree as ET
import requests
url = "https://www.w3schools.com/xml/cd_catalog.xml"

We're doing web scraping for the next few exercices so we are gonna use the requests library a whole bunch, it allows us to acces web data in different formats (XML, HTML... etc)
I'm not gonna be explaining why i used certain libraries since i refered to Mr Fabien Mattieu's lecture to resolve these exercices.

In [49]:
response = requests.get(url)
xml_content = response.content
xml_content

b'<?xml version="1.0" encoding="UTF-8"?>\n<CATALOG>\n  <CD>\n    <TITLE>Empire Burlesque</TITLE>\n    <ARTIST>Bob Dylan</ARTIST>\n    <COUNTRY>USA</COUNTRY>\n    <COMPANY>Columbia</COMPANY>\n    <PRICE>10.90</PRICE>\n    <YEAR>1985</YEAR>\n  </CD>\n  <CD>\n    <TITLE>Hide your heart</TITLE>\n    <ARTIST>Bonnie Tyler</ARTIST>\n    <COUNTRY>UK</COUNTRY>\n    <COMPANY>CBS Records</COMPANY>\n    <PRICE>9.90</PRICE>\n    <YEAR>1988</YEAR>\n  </CD>\n  <CD>\n    <TITLE>Greatest Hits</TITLE>\n    <ARTIST>Dolly Parton</ARTIST>\n    <COUNTRY>USA</COUNTRY>\n    <COMPANY>RCA</COMPANY>\n    <PRICE>9.90</PRICE>\n    <YEAR>1982</YEAR>\n  </CD>\n  <CD>\n    <TITLE>Still got the blues</TITLE>\n    <ARTIST>Gary Moore</ARTIST>\n    <COUNTRY>UK</COUNTRY>\n    <COMPANY>Virgin records</COMPANY>\n    <PRICE>10.20</PRICE>\n    <YEAR>1990</YEAR>\n  </CD>\n  <CD>\n    <TITLE>Eros</TITLE>\n    <ARTIST>Eros Ramazzotti</ARTIST>\n    <COUNTRY>EU</COUNTRY>\n    <COMPANY>BMG</COMPANY>\n    <PRICE>9.90</PRICE>\n    <Y

In [50]:
root = ET.fromstring(xml_content)
root

<Element 'CATALOG' at 0x0000026930E6DBC0>

- Look at the text content and load as xml.

The XML data is represented by a root element CATALOG that contains all the CDs (elements) in the collection.
Each CD is represented by a <CD> tag and inside each <CD> element, there are several sub-elements describing various information about the CD such as the <TITLE> or <ARTIST> and a few more. 

- Write a `display_cd` function that displays (i.e. `print`), for a CD: title, artist, country, company, year.
- Display all CDs.

we'll first define a function ```display_cd(entry)``` that takes an xml CD element  and displays it's sub elements and their contents 

```ET.fromstring(xml_content)``` converts the XML string into an element tree.
```root.findall('CD')``` gets a list of all CD elements.
```The display_cd(cd)``` function is then called for each CD element, extracting and printing its details.

In [51]:
def display_cd(entry):
    title = entry.find('TITLE').text
    artist = entry.find('ARTIST').text
    country = entry.find('COUNTRY').text
    company = entry.find('COMPANY').text
    year = entry.find('YEAR').text
    
    print(f"Title: {title}")
    print(f"Artist: {artist}")
    print(f"Country: {country}")
    print(f"Company: {company}")
    print(f"Year: {year}")
    print("-" * 40)

In [52]:
for child in root:
    display_cd(child)

Title: Empire Burlesque
Artist: Bob Dylan
Country: USA
Company: Columbia
Year: 1985
----------------------------------------
Title: Hide your heart
Artist: Bonnie Tyler
Country: UK
Company: CBS Records
Year: 1988
----------------------------------------
Title: Greatest Hits
Artist: Dolly Parton
Country: USA
Company: RCA
Year: 1982
----------------------------------------
Title: Still got the blues
Artist: Gary Moore
Country: UK
Company: Virgin records
Year: 1990
----------------------------------------
Title: Eros
Artist: Eros Ramazzotti
Country: EU
Company: BMG
Year: 1997
----------------------------------------
Title: One night only
Artist: Bee Gees
Country: UK
Company: Polydor
Year: 1998
----------------------------------------
Title: Sylvias Mother
Artist: Dr.Hook
Country: UK
Company: CBS
Year: 1973
----------------------------------------
Title: Maggie May
Artist: Rod Stewart
Country: UK
Company: Pickwick
Year: 1990
----------------------------------------
Title: Romanza
Artist: A

In [53]:
#gemini wrote this one 
for cd in root.findall('CD'):
    display_cd(cd)

Title: Empire Burlesque
Artist: Bob Dylan
Country: USA
Company: Columbia
Year: 1985
----------------------------------------
Title: Hide your heart
Artist: Bonnie Tyler
Country: UK
Company: CBS Records
Year: 1988
----------------------------------------
Title: Greatest Hits
Artist: Dolly Parton
Country: USA
Company: RCA
Year: 1982
----------------------------------------
Title: Still got the blues
Artist: Gary Moore
Country: UK
Company: Virgin records
Year: 1990
----------------------------------------
Title: Eros
Artist: Eros Ramazzotti
Country: EU
Company: BMG
Year: 1997
----------------------------------------
Title: One night only
Artist: Bee Gees
Country: UK
Company: Polydor
Year: 1998
----------------------------------------
Title: Sylvias Mother
Artist: Dr.Hook
Country: UK
Company: CBS
Year: 1973
----------------------------------------
Title: Maggie May
Artist: Rod Stewart
Country: UK
Company: Pickwick
Year: 1990
----------------------------------------
Title: Romanza
Artist: A

- Display all 1980s CDs. 

For this function we should think about accessing all CDs through a for loop and use the previous function to display only the cds that were released on the 80s (between 1980 and 1990)
i made sure i converted the ```find('YEAR')``` to an integer to use for inequalities, befause the find and find all methods return strings. 

In [54]:
def display_80s_cds(root):
    for cd in root.findall('CD'):
        year = int(cd.find('YEAR').text)
        if 1980 <= year <= 1989:
            display_cd(cd)

display_80s_cds(root)

Title: Empire Burlesque
Artist: Bob Dylan
Country: USA
Company: Columbia
Year: 1985
----------------------------------------
Title: Hide your heart
Artist: Bonnie Tyler
Country: UK
Company: CBS Records
Year: 1988
----------------------------------------
Title: Greatest Hits
Artist: Dolly Parton
Country: USA
Company: RCA
Year: 1982
----------------------------------------
Title: When a man loves a woman
Artist: Percy Sledge
Country: USA
Company: Atlantic
Year: 1987
----------------------------------------
Title: Stop
Artist: Sam Brown
Country: UK
Company: A and M
Year: 1988
----------------------------------------
Title: Bridge of Spies
Artist: T'Pau
Country: UK
Company: Siren
Year: 1987
----------------------------------------
Title: Private Dancer
Artist: Tina Turner
Country: UK
Company: Capitol
Year: 1983
----------------------------------------
Title: Midt om natten
Artist: Kim Larsen
Country: EU
Company: Medley
Year: 1983
----------------------------------------
Title: Picture book

- Display all British CDs.

This one is not so different from the previous one as we just have to access all CDs one by one through a for loop and check at each iteration is the country is "UK" and display it if it is the case.

In [55]:
def display_brittish_cds(root):
    for cd in root.findall('CD'):
        country = cd.find('COUNTRY').text
        if country == 'UK' :
            display_cd(cd)

display_brittish_cds(root)

Title: Hide your heart
Artist: Bonnie Tyler
Country: UK
Company: CBS Records
Year: 1988
----------------------------------------
Title: Still got the blues
Artist: Gary Moore
Country: UK
Company: Virgin records
Year: 1990
----------------------------------------
Title: One night only
Artist: Bee Gees
Country: UK
Company: Polydor
Year: 1998
----------------------------------------
Title: Sylvias Mother
Artist: Dr.Hook
Country: UK
Company: CBS
Year: 1973
----------------------------------------
Title: Maggie May
Artist: Rod Stewart
Country: UK
Company: Pickwick
Year: 1990
----------------------------------------
Title: For the good times
Artist: Kenny Rogers
Country: UK
Company: Mucik Master
Year: 1995
----------------------------------------
Title: Tupelo Honey
Artist: Van Morrison
Country: UK
Company: Polydor
Year: 1971
----------------------------------------
Title: The very best of
Artist: Cat Stevens
Country: UK
Company: Island
Year: 1990
----------------------------------------
Tit

# Exercice 3 - Analyze JSON

- Write a Python program that gets the file of filming locations in Paris at: 

In [61]:
url = "https://opendata.paris.fr/explore/dataset/lieux-de-tournage-a-paris/download/?format=json&timezone=Europe/Berlin&lang=fr"

- How many entries have you got?

https://pynative.com/python-json-load-and-loads-to-parse-json/ i looked at this page to know more about json parsing and serialisation and came to know about json.loads() etc 

we've got 12265 entries in our json file, which means we have informations about 12265 filming location in paris

In [62]:
import json
response = requests.get(url)
locs = json.loads(response.text)
print("total number of entries is :", len(locs))
locs

total number of entries is : 12265


[{'datasetid': 'lieux-de-tournage-a-paris',
  'recordid': '0ff321c5b140a12a8e50a1b212a7c5f5bced91d7',
  'fields': {'coord_x': 2.37006242,
   'id_lieu': '2017-751',
   'adresse_lieu': 'rue du faubourg du temple, 75011 paris',
   'geo_shape': {'coordinates': [2.370062415669748, 48.8696979988026],
    'type': 'Point'},
   'coord_y': 48.869698,
   'ardt_lieu': '75011',
   'nom_tournage': '2 Fils (Nouvelle Demande Décor Librairie / Journées interverties)',
   'nom_realisateur': 'Félix MOATI',
   'date_debut': '2017-10-19',
   'type_tournage': 'Long métrage',
   'annee_tournage': '2017',
   'nom_producteur': 'NORD OUEST FILMS',
   'date_fin': '2017-10-19',
   'geo_point_2d': [48.8696979988026, 2.370062415669748]},
  'geometry': {'type': 'Point',
   'coordinates': [2.370062415669748, 48.8696979988026]},
  'record_timestamp': '2024-01-31T13:40:46.402+01:00'},
 {'datasetid': 'lieux-de-tournage-a-paris',
  'recordid': 'a81ecdaf39edd9535d1ccd61d7f32b3c4971e8f1',
  'fields': {'coord_x': 2.34248745

- Analyze the JSON file: what is its structure?
- Write a function that converts an entry in a string that shows director, title, district, start date, end date, and geographic coordinates.
- Convert all entries in strings (warning: some entries may have issues).
- Display the first 20 entries.

I basically just loaded the json file content into a list where every element is an entry, and each entry represents a filming location. Every element has a dictionary like structure (just like every json file since it uses the key:value concept) which we can see above in the first element i displayed. 

i ran into so many keyerrors while trying to do this function, i had to ask chatgpt and it gave me the .get() method that helped handeling missing values by replacing them with strings that say "unknown director" for a missing nom_réalisateur value.

In [63]:
def display_loc(entry):
    return f'"{entry["fields"]["nom_tournage"]}" by {entry["fields"].get("nom_realisateur", "Unknown director")}, from {entry["fields"].get("date_debut", "Unknown data")} to {entry["fields"].get("date_fin", "Unknown data")}, in {entry["fields"].get("ardt_lieu", "Unknown place")} ({entry["fields"]["geo_point_2d"][0]}, {entry["fields"]["geo_point_2d"][1]})'

In [64]:
all_entries = [display_loc(e) for e in locs]
print('\n'.join(all_entries[:20]))

"2 Fils (Nouvelle Demande Décor Librairie / Journées interverties)" by Félix MOATI, from 2017-10-19 to 2017-10-19, in 75011 (48.8696979988026, 2.370062415669748)
"Vernon Subutex" by Cathy Verney, from 2018-04-25 to 2018-04-26, in 75001 (48.85849330754624, 2.342487451056846)
"LEBOWITZ CONTRE LEBOWITZ 2" by Olivier Barma, from 2017-06-01 to 2017-06-01, in 75010 (48.87597363622085, 2.3646350537874703)
"À jamais fidèle" by cheyenne carron, from 2017-08-24 to 2017-08-25, in 75020 (48.85154733669693, 2.3986003434290892)
"CHRONIQUES PARISIENNES 16" by ZABOU BREITMAN, from 2017-04-18 to 2017-04-18, in 75013 (48.82655664927356, 2.3812794291774626)
"LOLYWOOD - DANS TES REVES LE SPORT" by Matthieu MARES-SAVELLI, from 2017-04-13 to 2017-04-13, in 75019 (48.893005176977596, 2.3977875069644603)
"Un homme pressé" by Hervé Mimran, from 2017-05-23 to 2017-05-24, in 75012 (48.84258570428103, 2.3691360159868426)
"L'AMOUR EST UNE FÊTE" by Cédric ANGER, from 2017-06-14 to 2017-06-14, in 75018 (48.882670376

- A same movie can have multiple shooting locations. Make a list of movies, where each entry contains the movie title, its director, and shootings locations (district, start date, end date).
- How many movies do you have?
- Write a function that converts a movie into a string that shows director, title, and shootings.
- Convert all movies in strings.
- Display the first 20 entries.

What i understood from this part was that we had to create new data about the movies rather than locations using the locs list and so we had to have for each movie that exists in our database the director, movie title and the different locations it was shot in. The issue was that we knew from previous parts of this exercice that this database had a few (or a lot) of missing data for certain fields so we know we'll have to handle them later after we get a lot of key errors. 

An approach is to create a dictionary where each key is a movie title and then each value is a dictionary where the keys are 'director' that has a string value, 'locations' whose value is a list of tuples that look like: (district, start date, end date). The first thing that comes to mind is to naively use a for loop that loops through locs and and fills the previously declared dictionary and check at the start of every iteration if the movie is already in the dictionary, if yes then we'll just have to append the new location and the start and end date as a tuple in the locations list. else we just create a new movie key and fill it. 



In [66]:
movies = dict()

for entry in locs:
    movie_title = entry['fields']['nom_tournage']
    if movie_title not in movies:
        if 'nom_realisateur' in entry['fields']:
            movies[movie_title] = {'title': movie_title,
            'director': entry['fields']['nom_realisateur'],
            'locations': []
        }
        else:
            movies[movie_title] = {'title': movie_title,
                    'director': 'Inconnu',
                    'locations': []
        }
    if 'ardt_lieu' in entry['fields']:
        movies[movie_title]['locations'].append((entry['fields']['ardt_lieu'], 
                                                entry['fields'].get('date_debut', 'Date inconnue'),
                                                entry['fields'].get('date_fin', 'Date inconnue')
        ))
    #movies[movie_title]['locations'].append(( entry['fields']['ardt_lieu'], entry['fields']['date_debut'], entry['fields']['date_fin']))
  
movies = [m for m in movies.values()]

As predicted many key errors occured due to missing data, and i handled them with simple if statements and the .get() method. 

In [67]:
movies

[{'title': '2 Fils (Nouvelle Demande Décor Librairie / Journées interverties)',
  'director': 'Félix MOATI',
  'locations': [('75011', '2017-10-19', '2017-10-19'),
   ('75011', '2017-10-19', '2017-10-19')]},
 {'title': 'Vernon Subutex',
  'director': 'Cathy Verney',
  'locations': [('75001', '2018-04-25', '2018-04-26'),
   ('75019', '2018-05-22', '2018-05-22'),
   ('75019', '2018-05-25', '2018-05-25'),
   ('75010', '2018-05-03', '2018-05-06'),
   ('75011', '2018-03-19', '2018-03-19'),
   ('75011', '2018-06-01', '2018-06-02'),
   ('75014', '2018-06-05', '2018-06-14'),
   ('75014', '2018-06-13', '2018-06-13'),
   ('75019', '2018-05-22', '2018-05-22'),
   ('75009', '2018-04-11', '2018-04-11'),
   ('75007', '2018-06-13', '2018-06-15'),
   ('75012', '2018-03-23', '2018-03-23'),
   ('75011', '2018-04-04', '2018-04-04'),
   ('75016', '2018-03-30', '2018-03-30'),
   ('75004', '2018-03-20', '2018-03-20'),
   ('75004', '2018-04-26', '2018-04-27'),
   ('75012', '2018-08-28', '2018-08-30'),
   ('7

In [68]:
len(movies)

1476

I asked chat gpt to tell me what to do to display the locations cause as easy as it looks (and actually is) i couldn't figure out how to loop inside the join, i didn't even know it was possible. 

In [69]:
def display_movie(movie):
    return f'"{movie["title"]}", by {movie["director"]}. Shootings: {"; ".join(f"{location[0]} from {location[1]} to {location[2]}" for location in movie["locations"])}'

In [70]:
all_movie_displays = [display_movie(m) for m in movies]
print('\n'.join(all_movie_displays[:20]))

"2 Fils (Nouvelle Demande Décor Librairie / Journées interverties)", by Félix MOATI. Shootings: 75011 from 2017-10-19 to 2017-10-19; 75011 from 2017-10-19 to 2017-10-19
"Vernon Subutex", by Cathy Verney. Shootings: 75001 from 2018-04-25 to 2018-04-26; 75019 from 2018-05-22 to 2018-05-22; 75019 from 2018-05-25 to 2018-05-25; 75010 from 2018-05-03 to 2018-05-06; 75011 from 2018-03-19 to 2018-03-19; 75011 from 2018-06-01 to 2018-06-02; 75014 from 2018-06-05 to 2018-06-14; 75014 from 2018-06-13 to 2018-06-13; 75019 from 2018-05-22 to 2018-05-22; 75009 from 2018-04-11 to 2018-04-11; 75007 from 2018-06-13 to 2018-06-15; 75012 from 2018-03-23 to 2018-03-23; 75011 from 2018-04-04 to 2018-04-04; 75016 from 2018-03-30 to 2018-03-30; 75004 from 2018-03-20 to 2018-03-20; 75004 from 2018-04-26 to 2018-04-27; 75012 from 2018-08-28 to 2018-08-30; 75011 from 2018-03-19 to 2018-03-19; 75002 from 2018-03-20 to 2018-03-20; 75009 from 2018-04-09 to 2018-04-10; 75010 from 2018-08-28 to 2018-08-30; 75012 fr

- Display for each district its number of shootings. 

The only way i can think about is just to get all the districts present on the locs file, even if there are duplicates but that's not a problem since we can use the set() function to get distinct districts. we will create a dictionary and make the districts set values as the dictionary keys and initialise all values to 0, just like we do with simple variables when we want to make a counter to use inside a loop. 
Then we loop through the locs elements and for every element, if the 'ardt_lieu' key exists then add +1 to the value of the key in the stats dictionarry corresponding to it's value in the locs element and that way we get our frequencies dictionary. 

Now a problem raises, is that i'm not quite sure how to order dictionaries by descending order (more like i can't remember).

In [71]:
districts = []
for m in locs:
    if 'ardt_lieu' in m['fields']:
        districts.append(m["fields"]["ardt_lieu"])
districts = set(districts)

stats = {district: 0 for district in districts}
for m in locs: 
    if 'ardt_lieu' in m['fields']:
        stats[m["fields"]["ardt_lieu"]] += 1


i looked up on google how to use the lambda function to sort dictionaries (i knew from previous data science projets that it was used often for tasks like these): https://www.freecodecamp.org/news/sort-dictionary-by-value-in-python/ 

In [72]:
stats = dict(sorted(stats.items(), key=lambda item: item[1], reverse=True))

In [73]:
stats

{'75018': 1043,
 '75008': 798,
 '75010': 749,
 '75019': 745,
 '75001': 722,
 '75004': 670,
 '75013': 658,
 '75007': 657,
 '75009': 642,
 '75011': 641,
 '75005': 640,
 '75016': 614,
 '75012': 596,
 '75020': 587,
 '75006': 471,
 '75116': 421,
 '75017': 378,
 '75015': 363,
 '75014': 321,
 '75002': 297,
 '75003': 236,
 '93500': 6,
 '94320': 4,
 '93200': 1,
 '92220': 1,
 '93000': 1,
 '92170': 1,
 '93320': 1}

# Exercice 4 - Analyze CSV

- Write a Python code retrieves the file of the most loaned titles in libraries in Paris at: 

In [74]:
url = "https://opendata.paris.fr/explore/dataset/les-titres-les-plus-pretes/download/?format=csv&timezone=Europe/Berlin&lang=en&use_labels_for_header=true&csv_separator=%3B"

As we previously did a requests.get() will get us the csv file aswell 

In [75]:
import requests as s
import pandas as pd 
from io import StringIO

response = s.get(url).text

In [76]:
response 

'Type de document;Prêts 2022;Titre;Auteur;Nombre de localisations;Nombre de prêt total;Nombre d\'exemplaires\r\nBande dessinée jeunesse;1064;Razzia;Sobral,  Patrick;47;2938;67\r\nBande dessinée jeunesse;1024;Touche pas à mon veau;Guibert,  Emmanuel;45;2296;71\r\nBande dessinée jeunesse;1016;Max et Lili vont chez papy et mamie;Saint-Mars,  Dominique de;50;5554;103\r\nBande dessinée jeunesse;938;Lili veut un petit chat;Saint-Mars,  Dominique de;51;5789;80\r\nBande dessinée jeunesse;921;Max et Lili font du camping;Saint-Mars,  Dominique de;52;5658;83\r\nBande dessinée jeunesse;901;Lili trouve sa maîtresse méchante;Saint-Mars,  Dominique de;40;4694;72\r\nBande dessinée jeunesse;869;J\'irai où tu iras;Lyfoung,  Patricia;51;4707;76\r\nBande dessinée jeunesse;861;Les nerfs à vif;Nob;48;2837;62\r\nBande dessinée jeunesse;839;Je crois que je t\'aime;Lyfoung,  Patricia;52;3878;70\r\nBande dessinée jeunesse;799;Attention tornade;Cazenove,  Christophe;42;2366;56\r\nBande dessinée jeunesse;779;Max 

We have a csv file in the form of a string, which means we have tabular data where each row of the table is separated by "\r\n" and elements in a row are separated but a semicolon in the case. 
let's print a few lines (a few hundred characters in the whole string)

In [77]:
print(response[:349])

Type de document;Prêts 2022;Titre;Auteur;Nombre de localisations;Nombre de prêt total;Nombre d'exemplaires
Bande dessinée jeunesse;1064;Razzia;Sobral,  Patrick;47;2938;67
Bande dessinée jeunesse;1024;Touche pas à mon veau;Guibert,  Emmanuel;45;2296;71
Bande dessinée jeunesse;1016;Max et Lili vont chez papy et mamie;Saint-Mars,  Dominique de;50;


We already have the CSV content stored as a text object in memory. To efficiently process this data, we'll use the StringIO module from the io library, which allows us to treat the text object as if it were an actual file. Once we do this, we can easily load the data into a pandas DataFrame, which is a powerful tool for working with tabular data. This approach uses pandas, a library I'm most comfortable with for handling and analyzing such data.

In [78]:
with StringIO(response) as csvfile:
    books = pd.read_csv(csvfile, sep=';')
books

Unnamed: 0,Type de document,Prêts 2022,Titre,Auteur,Nombre de localisations,Nombre de prêt total,Nombre d'exemplaires
0,Bande dessinée jeunesse,1064,Razzia,"Sobral, Patrick",47,2938,67
1,Bande dessinée jeunesse,1024,Touche pas à mon veau,"Guibert, Emmanuel",45,2296,71
2,Bande dessinée jeunesse,1016,Max et Lili vont chez papy et mamie,"Saint-Mars, Dominique de",50,5554,103
3,Bande dessinée jeunesse,938,Lili veut un petit chat,"Saint-Mars, Dominique de",51,5789,80
4,Bande dessinée jeunesse,921,Max et Lili font du camping,"Saint-Mars, Dominique de",52,5658,83
...,...,...,...,...,...,...,...
837,Bande dessinée jeunesse,572,Aventure au pays des samouraïs,"Oda, Eiichir?",48,1511,54
838,Bande dessinée jeunesse,566,Yakari et la tueuse des mers,Job,43,3677,47
839,Bande dessinée jeunesse,555,Naruto. 59,"Kishimoto, Masashi",44,2735,53
840,Bande dessinée jeunesse,554,La famille avant tout,"Morel, Marylise",39,2285,44


One of the great features of pandas DataFrames is how easy it is to identify and count missing data. We can achieve this by generating a boolean array using the .isnull() (or .isna()) function, which marks missing values (NaN) as True and present values as False. By applying the .sum() function to this boolean array along the columns (using axis=0), pandas will return the total count of missing values for each column. This gives us a quick and efficient way to see which columns contain missing data and how many values are missing in each.

In [79]:
print(books.isnull().sum())

Type de document            0
Prêts 2022                  0
Titre                       0
Auteur                     15
Nombre de localisations     0
Nombre de prêt total        0
Nombre d'exemplaires        0
dtype: int64


In [80]:
print(f'The percentage of missing values within the "Auteur" column is: {(books.isnull().sum().sum() / len(books)) * 100:.2f}%')

The percentage of missing values within the "Auteur" column is: 1.78%


The rows with missing data in the 'Auteur' column make up only 1.78% of the total, so removing them wouldn't have a significant impact. However, there are alternatives to dropping rows, such as filling the missing values with statistical measures like the mean or median, depending on the context.

In [81]:
books = books.dropna(subset=['Auteur'])

- Analyze the resulting CSV file to display, for all entries: title, author, and total number of loans.

Let's first of all look at a whole row first :

In [82]:
books.iloc[0,:]

Type de document           Bande dessinée jeunesse
Prêts 2022                                    1064
Titre                                       Razzia
Auteur                            Sobral,  Patrick
Nombre de localisations                         47
Nombre de prêt total                          2938
Nombre d'exemplaires                            67
Name: 0, dtype: object

We want to display the book title, the author(s), and the number of loans, which correspond to the 3rd, 4th, and 6th columns in our dataset. To achieve this, we'll use the .iterrows() method on our DataFrame.
This method takes a DataFrame (or a portion of a dataframe) and returns an iterator that goes over each row and returns the row index along with the data for that row, thus allowing us to easily access column data within each row using their numerical index positions.

In [83]:
def disp_book(book):
    return f'"{book[2]}", by {book[3]} ({book[5]} loans)'

In [84]:
print('\n'.join( [disp_book(row) for  _, row in books.iloc[:20].iterrows()]))

"Razzia", by Sobral,  Patrick (2938 loans)
"Touche pas à mon veau", by Guibert,  Emmanuel (2296 loans)
"Max et Lili vont chez papy et mamie", by Saint-Mars,  Dominique de (5554 loans)
"Lili veut un petit chat", by Saint-Mars,  Dominique de (5789 loans)
"Max et Lili font du camping", by Saint-Mars,  Dominique de (5658 loans)
"Lili trouve sa maîtresse méchante", by Saint-Mars,  Dominique de (4694 loans)
"J'irai où tu iras", by Lyfoung,  Patricia (4707 loans)
"Les nerfs à vif", by Nob (2837 loans)
"Je crois que je t'aime", by Lyfoung,  Patricia (3878 loans)
"Attention tornade", by Cazenove,  Christophe (2366 loans)
"Max et Lili se posent des questions sur Dieu", by Saint-Mars,  Dominique de (4823 loans)
"Game over. 13. Toxic affair", by Midam (2652 loans)
"Les Schtroumpfs et la tempête blanche", by Jost,  Alain (975 loans)
"On a marché sur la lune", by Hergé (5674 loans)
"Astérix chez les Bretons", by Goscinny,  René (3014 loans)
"Parvati", by Ogaki,  Philippe (2616 loans)
"Les Schtroumpf

- Display for each type of document (there can be several entries for the same type of document), the total number of loans for this type. 

I decided to make a list of all the different document types, and then use that list in a loop. For each type, I filter the 'Type de document' column to only get the rows that match the current type. Then, I grab the 'Nombre de prêt total' for those rows and sum them up to find the total loans for that type.

In [85]:
type = list(set(books["Type de document"]))
stats = dict()
for t in type:
    total_loans = books[books["Type de document"] == t]["Nombre de prêt total"].sum()
    stats[t] = total_loans

In [35]:
stats

{'Bande dessinée jeunesse': 2281634,
 'Bande dessinée ado': 29819,
 'Livre jeunesse': 102273,
 'Bande dessinée adulte': 59726,
 'Livre sonore jeunesse': 10630,
 'Musique jeunesse': 4792,
 'Livre adulte': 41731,
 'Jeux de société prêtable': 1792}

- Display titles in order of profitability (in descending order of the number of loans per copy).

What I understood from this question is that profitability means the number of loans per copy. So, to make things clearer, I created a new column called 'profitability,' which is just the total number of loans ('Nombre de prêts total') divided by the number of copies ('Nombre d'exemplaires'). After that, I sorted the books by this new 'profitability' column in descending order using the sort_values() method.

In [20]:
books["profitability"] = books["Nombre de prêt total"] / books["Nombre d'exemplaires"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  books["profitability"] = books["Nombre de prêt total"] / books["Nombre d'exemplaires"]


The issue here is that i do not have the expected results, even tho i do not see where i did wrong.

In [23]:
books

Unnamed: 0,Type de document,Prêts 2022,Titre,Auteur,Nombre de localisations,Nombre de prêt total,Nombre d'exemplaires,profitability
0,Bande dessinée jeunesse,1064,Razzia,"Sobral, Patrick",47,2938,67,43.850746
1,Bande dessinée jeunesse,1024,Touche pas à mon veau,"Guibert, Emmanuel",45,2296,71,32.338028
2,Bande dessinée jeunesse,1016,Max et Lili vont chez papy et mamie,"Saint-Mars, Dominique de",50,5554,103,53.922330
3,Bande dessinée jeunesse,938,Lili veut un petit chat,"Saint-Mars, Dominique de",51,5789,80,72.362500
4,Bande dessinée jeunesse,921,Max et Lili font du camping,"Saint-Mars, Dominique de",52,5658,83,68.168675
...,...,...,...,...,...,...,...,...
837,Bande dessinée jeunesse,572,Aventure au pays des samouraïs,"Oda, Eiichir?",48,1511,54,27.981481
838,Bande dessinée jeunesse,566,Yakari et la tueuse des mers,Job,43,3677,47,78.234043
839,Bande dessinée jeunesse,555,Naruto. 59,"Kishimoto, Masashi",44,2735,53,51.603774
840,Bande dessinée jeunesse,554,La famille avant tout,"Morel, Marylise",39,2285,44,51.931818


In [24]:
sorted_books = books.sort_values(by='profitability', ascending=False)

In [25]:
print('\n'.join( [disp_book(row) for _, row in sorted_books[:20].iterrows()]))

"Un enfant chez les schtroumpfs", by Díaz Vizoso,  Miguel (4504 loans)
"Mon meilleur ami", by Verron,  Laurent (4662 loans)
"Les vacances infernales", by Cohen,  Jacqueline (5014 loans)
"Bande de sauvages !", by Cohen,  Jacqueline (5761 loans)
"Trop, c'est trop !", by Cohen,  Jacqueline (4504 loans)
"Les fous du mercredi", by Cohen,  Jacqueline (5169 loans)
"Ca va chauffer !", by Cohen,  Jacqueline (4071 loans)
"Ca roule !", by Cohen,  Jacqueline (5763 loans)
"Salut, les zinzins !", by Cohen,  Jacqueline (4565 loans)
"Les deux terreurs", by Cohen,  Jacqueline (3999 loans)
"Subliiiimes !", by Cohen,  Jacqueline (5007 loans)
"Un copieur sachant copier", by Godi,  Bernard (3481 loans)
"A l'attaque !", by Cohen,  Jacqueline (4353 loans)
"Tom-Tom et l'impossible Nana", by Cohen,  Jacqueline (5832 loans)
"Abracada...boum !", by Cohen,  Jacqueline (4149 loans)
"Poux, papous et pas papous", by Cohen,  Jacqueline (4466 loans)
"Ici Radio-casserole", by Cohen,  Jacqueline (3584 loans)
"Tremblez, 

# Exercice 5 * - Analyze HTML

- Write a Python program that gets the content of the Wikipedia page at: 

In [86]:
url = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population_density"

It goes the same way as the previous exercices, we'll use the requests library and the get method to access the content of a web page via it's URL. 
HTML web pages much like XML file, the print function gives us a structured view of the page since the requests function retrieves the raw data. 

In [87]:
page = s.get(url)
print(page.text)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-toc-available" lang="en" dir="ltr">
<head>
<meta charset="UTF-8">
<title>List of countries and dependencies by population density - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-fe

- Display all the countries mentioned in the table. 

I couldn't just print the entire webpage's HTML code and manually search for the specific line containing the country names. The best approach is to open the URL and inspect the page using the browser's developer tools. By doing so, we can view the Document Object Model (DOM), which is a structured version of the webpage's HTML. To find the specific elements, we can hover over parts of the page with the mouse, and the corresponding HTML code will be highlighted, allowing us to locate the relevant section.

I asked chatgpt for help here especially for the lambda function. 

In [88]:
from bs4 import BeautifulSoup as Soup
soup = Soup(page.text, "html.parser")
table = soup.find_all("a", title=lambda t: t and t.startswith("Demographics of"))
countries = [c.text for c in table if c.text.strip()]

I can see that the last two elements aren't real countries so we'll just get rid of them. 

In [89]:
countries[-2:]

['Demographics of the world', 'Antarctica']

In [90]:
countries = countries[:-2]
countries

['Macau',
 'Monaco',
 'Singapore',
 'Hong Kong',
 'Gibraltar',
 'Bahrain',
 'Maldives',
 'Malta',
 'Vatican City',
 'Sint Maarten',
 'Bermuda',
 'Bangladesh',
 'Guernsey',
 'Jersey',
 'Mayotte',
 'Palestine',
 'Taiwan',
 'Mauritius',
 'Barbados',
 'Nauru',
 'Saint Martin',
 'Aruba',
 'San Marino',
 'Rwanda',
 'South Korea',
 'Lebanon',
 'Saint Barthélemy',
 'Burundi',
 'Tuvalu',
 'India',
 'Curaçao',
 'Netherlands',
 'Haiti',
 'Israel',
 'Réunion',
 'Philippines',
 'Belgium',
 'Comoros',
 'Grenada',
 'Puerto Rico',
 'Martinique',
 'Sri Lanka',
 'Japan',
 'Guam',
 'El Salvador',
 'Pakistan',
 'Trinidad and Tobago',
 'Vietnam',
 'Saint Lucia',
 'U.S. Virgin Islands',
 'United Kingdom',
 'Saint Vincent and the Grenadines',
 'Cayman Islands',
 'Jamaica',
 'Luxembourg',
 'Liechtenstein',
 'Gambia',
 'Nigeria',
 'Kuwait',
 'Guadeloupe',
 'São Tomé and Príncipe',
 'Seychelles',
 'Qatar',
 'Germany',
 'Dominican Republic',
 'Marshall Islands',
 'Malawi',
 'American Samoa',
 'North Korea',
 'An

- Display for each country its rank, density, population, area. 

Let's start by looking at one row of the countries table : 

In [91]:
trows = soup.find_all("tr")
trows[1]

<tr>
<td style="text-align:left"><span class="flagicon" style="display:inline-block;width:25px;text-align:left"><span class="mw-image-border" typeof="mw:File"><span><img alt="" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/63/Flag_of_Macau.svg/23px-Flag_of_Macau.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/63/Flag_of_Macau.svg/35px-Flag_of_Macau.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/63/Flag_of_Macau.svg/45px-Flag_of_Macau.svg.png 2x" width="23"/></span></span></span> <a href="/wiki/Demographics_of_Macau" title="Demographics of Macau">Macau</a> (<a href="/wiki/China" title="China">China</a>)</td>
<td data-sort-value="7010543897503170560" style="text-align:right;">21,000
</td>
<td data-sort-value="7010543897503170560" style="text-align:right;">54,000</td>
<td>704,150</td>
<td style="text-align:right;">33
</td>
<td style="text-align:right;

we can see that of all the table rows existing in the page the 2nd one (index 1) corresponds to the 'Macau' row in the table which is the first country on the table. 

We know that the information we're interested in begins from the second row of the trows list. To extract and display this data, we can simply loop through the rows (ignoring the first one). For each row, we use the find_all function to gather all the elements (<td> tags) in that row, which will give us access to the country name, population, area, and other related information.

In [95]:
len(countries)

249

In [101]:
data = []
for row in trows[1:253]:
    cells = row.find_all('td')
    
    country = cells[0].text.strip()
    population = cells[1].text.strip()
    area_km2 = cells[2].text.strip()
    area_sq_mi = cells[3].text.strip()
    density = cells[4].text.strip()

    # Store the extracted data
    data.append({
        'Country': country,
        'Population': population,
        'Area (km²)': area_km2,
        'Area (sq mi)': area_sq_mi,
        'Density': density,
    })


In [102]:
data

[{'Country': 'Macau (China)',
  'Population': '21,000',
  'Area (km²)': '54,000',
  'Area (sq mi)': '704,150',
  'Density': '33'},
 {'Country': 'Monaco',
  'Population': '18,000',
  'Area (km²)': '47,000',
  'Area (sq mi)': '36,298',
  'Density': '2.0'},
 {'Country': 'Singapore',
  'Population': '8,250',
  'Area (km²)': '21,400',
  'Area (sq mi)': '6,014,723',
  'Density': '729'},
 {'Country': 'Hong Kong (China)',
  'Population': '6,725',
  'Area (km²)': '17,420',
  'Area (sq mi)': '7,491,609',
  'Density': '1,114'},
 {'Country': 'Gibraltar (UK)',
  'Population': '4,800',
  'Area (km²)': '12,000',
  'Area (sq mi)': '32,688',
  'Density': '6.8'},
 {'Country': 'Bahrain',
  'Population': '1,910',
  'Area (km²)': '4,900',
  'Area (sq mi)': '1,485,510',
  'Density': '778'},
 {'Country': 'Maldives',
  'Population': '1,750',
  'Area (km²)': '4,500',
  'Area (sq mi)': '523,787',
  'Density': '300'},
 {'Country': 'Malta',
  'Population': '1,700',
  'Area (km²)': '4,400',
  'Area (sq mi)': '535,

- Save the information obtained in a Python dictionary. 

The point here is to use the countries and data lists to create a dictionary that will allow us to access a country’s information by its name. For that, we can just use the zip function to combine the two lists. The zip function takes each country from the countries list and pairs it with the corresponding data from the data list (which contains information like population, area, and density). The dictionary comprehension ```{country: info for country, info in zip(countries, data)}``` creates a dictionary where the country name is the key and the associated data dictionary ```(with population, area, and density)``` is the value.

In [103]:
C_info = {country: info for country, info in zip(countries, data)}
C_info

{'Macau': {'Country': 'Macau (China)',
  'Population': '21,000',
  'Area (km²)': '54,000',
  'Area (sq mi)': '704,150',
  'Density': '33'},
 'Monaco': {'Country': 'Monaco',
  'Population': '18,000',
  'Area (km²)': '47,000',
  'Area (sq mi)': '36,298',
  'Density': '2.0'},
 'Singapore': {'Country': 'Singapore',
  'Population': '8,250',
  'Area (km²)': '21,400',
  'Area (sq mi)': '6,014,723',
  'Density': '729'},
 'Hong Kong': {'Country': 'Hong Kong (China)',
  'Population': '6,725',
  'Area (km²)': '17,420',
  'Area (sq mi)': '7,491,609',
  'Density': '1,114'},
 'Gibraltar': {'Country': 'Gibraltar (UK)',
  'Population': '4,800',
  'Area (km²)': '12,000',
  'Area (sq mi)': '32,688',
  'Density': '6.8'},
 'Bahrain': {'Country': 'Bahrain',
  'Population': '1,910',
  'Area (km²)': '4,900',
  'Area (sq mi)': '1,485,510',
  'Density': '778'},
 'Maldives': {'Country': 'Maldives',
  'Population': '1,750',
  'Area (km²)': '4,500',
  'Area (sq mi)': '523,787',
  'Density': '300'},
 'Malta': {'Co

- Using the previously saved Python dictionary, ask the user for a country, display the 
corresponding information.

Before writing this part of the code, we need to take care of a couple of important things:

1. Handling invalid inputs: Although our database is designed to contain information about all countries, there is always a chance that a user might enter a misspelled or non-existent country name. This could lead to an error if not handled properly. So, we need to check if the entered country exists in the database and provide an error message if it doesn't.

2. Input format: Since country names in the database start with a capital letter, user inputs that are entirely lowercase or uppercase might not match the names correctly. To handle this, we can use the .title() method to automatically convert the user's input to title case (where the first letter of each word is capitalized), ensuring the country name matches the database format regardless of how the user types it.

In [108]:
country_info = input("Enter a Country name : ").title()
if country_info in C_info:
        print(f"{country_info}'s general informations are:\n"
              f"1. Population : {C_info[country_info]['Population']}\n"
              f"2. Area (km²) : {C_info[country_info]['Area (km²)']}\n"
              f"3. Area (sq mi) : {C_info[country_info]['Area (sq mi)']}\n"
              f"4. Density : {C_info[country_info]['Density']}")
else:
    print(f"Sorry, '{country_info}' is not in the database. Please try again.")

Japan's general informations are:
1. Population : 326
2. Area (km²) : 840
3. Area (sq mi) : 123,294,513
4. Density : 377,930


# Exercice 6 * - API Web

- Write a Python program that will make available a Web API allowing elementary calculations on 
integers.

The APIs are accessible by GET and in the form: 
- /add/{integer1}/{integer2}: add integer1 and integer2
- /sub/{integer1}/{integer2}: perform the subtraction of integer1 and integer2
- /mul/{integer1}/{integer2}: carry out the multiplication of integer1 and integer2
- /div/{integer1}/{integer2}: perform the integer division of integer1 by integer2
- /mod/{integer1}/{integer2}: perform the remainder of the integer division of integer1
by integer2

I will use the Flask module to solve this exercise, as I have some experience working with it.

First, we'll import the Flask class from the Flask module. This class allows us to create a web application, represented by an instance of the Flask class. This app object is responsible for handling incoming requests and routing them to the appropriate functions. To define these routes, we use the @app.route() decorator, which specifies the URL path that will trigger a particular function. One of the services we want to offer via this app, is a functionality that adds two numbers, we will define a route using ```@app.route('/add/<int:integer1>/<int:integer2>')```. This route will handle requests to the /add URL with two integer parameters, a and b, and the associated function will perform the addition operation.

We'll have to create individual routes for each operation and then run the app. 

In [195]:
from flask import Flask



app = Flask(__name__)


@app.route('/add/<int:integer1>/<int:integer2>', methods=['GET'])
def add(integer1, integer2):
    result = integer1 + integer2
    return {'result': result}


@app.route('/sub/<int:integer1>/<int:integer2>', methods=['GET'])
def sub(integer1, integer2):
    result = integer1 - integer2
    return {'result': result}

@app.route('/mul/<int:integer1>/<int:integer2>', methods=['GET'])
def mul(integer1, integer2):
    result = integer1 * integer2
    return {'result': result}


@app.route('/div/<int:integer1>/<int:integer2>', methods=['GET'])
def div(integer1, integer2):
    if integer2 == 0:
        return {'error': 'Division by zero is not allowed'}, 400
    result = integer1 // integer2
    return {'result': result}

@app.route('/mod/<int:integer1>/<int:integer2>', methods=['GET'])
def mod(integer1, integer2):
    if integer2 == 0:
        return {'error': 'Division by zero is not allowed'}, 400
    result = integer1 % integer2
    return {'result': result}



I happen to know from previous experience that running a flask app on a jupiter notebook using 'app.run()' starts a web server that listens for incoming requests and serves responses. However, one thing to understand is that this Flask app runs in a blocking manner. Once we start the Flask server, it takes over the execution and waits indefinitely for incoming HTTP requests, meaning that if we're running Flask inside a Jupyter Notebook cell, no other cells can be executed until we manually stop the Flask server (by interrupting it).

One way to handle this issue is to use threading (we're used to it from the first TME and MP). By using a separate thread for the Flask app, we allow the server to run independently without blocking the main program. The main thread can still execute other tasks (or cells) while the Flask server handles web requests in the background, tha way we can try the web app by using a GET request using requests.

In [None]:
from threading import Thread

def run_flask():
    app.run(host='localhost', port=8080, debug=False, use_reloader=False)

# Lancer le thread Flask
flask_thread = Thread(target=run_flask)
flask_thread.start()

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://localhost:8080
Press CTRL+C to quit


http://localhost:8080/mul/6/7

http://localhost:8080/div/42/8

http://localhost:8080/mod/42/8

- Write a Python program that will test the web API made available through the requests
library. 

We'll just use the requests to access the API's services with a GET method like wwe did thtough the whole TME and we should be able to print the result. 

In [197]:
response = s.get("http://localhost:8080/mod/42/8")
print(response.text)

127.0.0.1 - - [17/Oct/2024 21:36:55] "GET /mod/42/8 HTTP/1.1" 200 -


{"result":2}



Let's try a second approach, that i leanred while reading the PROGRES lecture (web servers):
Instead of running our app in the background of our notebook (using threading) we will create a Python file that contains our Flask application. This is done using the ```%%writefile``` magic command in Jupyter, which saves the code to a file named run.py.

After saving the Flask app to run.py, we start the server by running the file and instead of running it in the same Jupyter Notebook (which could block other cells), we use the ```!wt python run.py``` command, which opens a new terminal window to execute run.py.

In [123]:
%%writefile run.py
from flask import Flask

app = Flask(__name__)


@app.route('/add/<int:integer1>/<int:integer2>', methods=['GET'])
def add(integer1, integer2):
    result = integer1 + integer2
    return {'result': result}


@app.route('/sub/<int:integer1>/<int:integer2>', methods=['GET'])
def sub(integer1, integer2):
    result = integer1 - integer2
    return {'result': result}

@app.route('/mul/<int:integer1>/<int:integer2>', methods=['GET'])
def mul(integer1, integer2):
    result = integer1 * integer2
    return {'result': result}


@app.route('/div/<int:integer1>/<int:integer2>', methods=['GET'])
def div(integer1, integer2):
    if integer2 == 0:
        return {'error': 'Division by zero is not allowed'}, 400
    result = integer1 // integer2
    return {'result': result}

@app.route('/mod/<int:integer1>/<int:integer2>', methods=['GET'])
def mod(integer1, integer2):
    if integer2 == 0:
        return {'error': 'Division by zero is not allowed'}, 400
    result = integer1 % integer2
    return {'result': result}


app.run(host='localhost', port=8080)

Overwriting run.py


In [124]:
!wt python run.py

In [125]:
response = s.get("http://localhost:8080/mod/42/8")
print(response.text)

{"result":2}

