# PROGRES - TME2

Fabien Mathieu - fabien.mathieu@normalesup.org

Sébastien Tixeuil - Sebastien.Tixeuil@lip6.fr

**Note**: 
- Star exercises (indicated by *) should only be done if all other exercises have been completed. You 
don't have to do them if you do not want.

# Rules

1. Cite your sources
2. One file to rule them all
3. Explain
4. Execute your code


https://github.com/balouf/progres/blob/main/rules.ipynb

# Exercice 1 - Regular Expressions

Consider the following list:

In [1]:
L = ['marie.Dupond@gmail.com', 'lucie.Durand@wanadoo.fr',
'Sophie.Parmentier @@ gmail.com', 'franck.Dupres.gmail.com',
'pierre.Martin@lip6 .fr ',' eric.Deschamps@gmail.com ']

- Which of these entries are valid?
- Use regular expressions to identify valid *gmail* addresses and display them. 

Answer

The valid entries are `'marie.Dupond@gmail.com'`, `' eric.Deschamps@gmail.com '`. We consider otherwise valid strings which are whitespace-padded to also be valid, as stripping is a simple operation, and this lends itself to a better user experience (if the user doesn't realize there is an invisible space, for example).

In [2]:
import re
import functools
from typing import List

GMAIL_RE = re.compile(r'^\s*([0-9A-Za-z_.]+@gmail.com)\s*')

def _true_gmail_reducer(accumulator: List[str], test_address: str) -> bool:
    gmail_match = GMAIL_RE.match(test_address)
    if not gmail_match: return accumulator
    address = gmail_match.group(1)
    return accumulator + [address]

def true_gmail(mail_list: List[str]) -> List[str]:
    return functools.reduce(_true_gmail_reducer, mail_list, [])

### Explanation

The `true_gmail` transforms a list of strings to a list of found, whitespace-stripped, gmail addresses. Because values of the output list may be transformed from those of the input list, a `reduce` is used in place of a `filter`. 

The reducer implements the logic. It tests against a gmail regex and implements two cases:
1. If there is no match, throw out the address by returning the unchanged accumulator
2. Otherwise, continue to the next iteration with the desired portion of the address, by returning the accumulator with the address portion appended

Note that `+` is used for list extension rather than `.append`. This is to prevent any unexpected behavior that could come from mutation.

In [3]:
true_gmail(L)

['marie.Dupond@gmail.com', 'eric.Deschamps@gmail.com']

- Use regular expressions to check if a string ends with a number. 

Answer

In [4]:
def ends_with_number(txt: str) -> bool:
    return bool(re.match(r'^.*\d$', txt))

### Explanation

`ends_with_number` checks for a match of the given parameter against the regular expression `^.*\d$`. The regular expression could be worded in English as: "match anything from the beginning of the string, then match a number followed by the string end".

`re.match` returns a `Match` object if a match is present, and `None` otherwise, but `ends_with_number` wants to return a boolean indicating yes or no. The result of `re.match` is transformed to the desired output by simply being passed to `bool`.

In [5]:
ends_with_number('to42to')

False

In [6]:
ends_with_number('to42to666')

True

- Use regular expressions to remove problematic zeros from an IPv4 address expressed as a 
string. (example: "216.08.094.196" should become "216.8.94.196", but "216.80.140.196" 
should remain "216.80.140.196"). 

Answer

In [7]:
IPV4_FIELD_RE = re.compile(r'0*(\d{1,3})')

def normalize_ip(txt):
    return '.'.join(IPV4_FIELD_RE.findall(txt))

### Explanation

`normalize_ip` uses a regular expression to match the desired substring for each sequence within an IPv4 address. The list of desired sequences is taken using `.findall`, which is then re-formatted to an IPv4 string using `'.'.join`. 

The regular expression used is `0*(\d{1,3})`. There are two parts to this expression:
1. `0*` matches 0 or more of the character `0`, at the beginning of the sequence, outside the capture group
2. `(\d{1,3})` matches 1-3 digits in a row for a sequence, and puts them in a capture group

The first part enables excluding leading `0`s from the capture group, while not requiring leading `0`s to match. The second part matching at least 1 digit enables capturing a `0` if it is the actual value of the sequence. e.g: The edge case `'000'` matches only the last `0` within the capture group.

In [8]:
normalize_ip("216.0.094.196")

'216.0.94.196'

In [9]:
normalize_ip("216.08.094.196")

'216.8.94.196'

In [10]:
normalize_ip("216.80.140.196")

'216.80.140.196'

In [11]:
normalize_ip("000.00.0.000")

'0.0.0.0'

- Use regular expressions to transform a date from MM-DD-YYYY format to DD-MM-YYYY 
format. (example "11-06-2020" should become "06-11-2020"). Optionally*, do the same thing using the `datetime` package.

Answer

In [12]:
DATE_RE = re.compile(r'^(\d{2})-(\d{2})-(\d{4})$')

def switch_md(txt: str) -> str:
    mm, dd, yyyy = DATE_RE.match(txt).groups()
    return '-'.join([dd, mm, yyyy])

### Explanation

`switch_md` uses a regex to match a full date string and grab groups of each section, then re-orders and re-joins them to the desired format.

Note that it is assumed the `txt` parameter matches this format, and does not define behavior for when this is not the case.

In [13]:
switch_md("11-06-2020")

'06-11-2020'

# Exercice 2 - Analyze XML

- Write a Python code that retrieves the content of the page at:

In [14]:
url = "https://www.w3schools.com/xml/cd_catalog.xml"

In [15]:
from requests import Session
import xml.etree.ElementTree as ET

s = Session()
r = s.get(url)

### Explanation

To retrieve the URL content, `Sessions.get` is used, to give the option to keep cookies and re-use a TCP connection if we were making multiple requests.

- Look at the text content and load as xml.

In [16]:
print(r.text)
cds = ET.fromstring(r.text)
print(f"Main tag: {cds.tag}; main attributes: {cds.attrib}")

<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD>
    <TITLE>Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <CD>
    <TITLE>Still got the blues</TITLE>
    <ARTIST>Gary Moore</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Virgin records</COMPANY>
    <PRICE>10.20</PRICE>
    <YEAR>1990</YEAR>
  </CD>
  <CD>
    <TITLE>Eros</TITLE>
    <ARTIST>Eros Ramazzotti</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>BMG</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1997</YEAR>
  </CD>
  <CD>
    <TITLE>

### Explanation

To load the result as XML, `ElementTree.fromstring` is used, for simplicity's sake.

Answer

- Write a `display_cd` function that displays (i.e. `print`), for a CD: title, artist, country, company, year.
- Display all CDs.

Answer

In [17]:
def display_cd(cd: ET) -> None:
    properties = [f'{child.tag}: {child.text}' for child in cd]
    print(', '.join(properties))

### Explanation

The chosen format for displaying a CD is to display all child tags and their text content, separated by commas. This is done by first creating a list of tags + values with the desired format, and then utilizing `.join` to easily intersperse commas, and printing the result.

- Display all 1980s CDs. 

In [18]:
for cd in cds:
  display_cd(cd)

TITLE: Empire Burlesque, ARTIST: Bob Dylan, COUNTRY: USA, COMPANY: Columbia, PRICE: 10.90, YEAR: 1985
TITLE: Hide your heart, ARTIST: Bonnie Tyler, COUNTRY: UK, COMPANY: CBS Records, PRICE: 9.90, YEAR: 1988
TITLE: Greatest Hits, ARTIST: Dolly Parton, COUNTRY: USA, COMPANY: RCA, PRICE: 9.90, YEAR: 1982
TITLE: Still got the blues, ARTIST: Gary Moore, COUNTRY: UK, COMPANY: Virgin records, PRICE: 10.20, YEAR: 1990
TITLE: Eros, ARTIST: Eros Ramazzotti, COUNTRY: EU, COMPANY: BMG, PRICE: 9.90, YEAR: 1997
TITLE: One night only, ARTIST: Bee Gees, COUNTRY: UK, COMPANY: Polydor, PRICE: 10.90, YEAR: 1998
TITLE: Sylvias Mother, ARTIST: Dr.Hook, COUNTRY: UK, COMPANY: CBS, PRICE: 8.10, YEAR: 1973
TITLE: Maggie May, ARTIST: Rod Stewart, COUNTRY: UK, COMPANY: Pickwick, PRICE: 8.50, YEAR: 1990
TITLE: Romanza, ARTIST: Andrea Bocelli, COUNTRY: EU, COMPANY: Polydor, PRICE: 10.80, YEAR: 1996
TITLE: When a man loves a woman, ARTIST: Percy Sledge, COUNTRY: USA, COMPANY: Atlantic, PRICE: 8.70, YEAR: 1987
TITLE

### Explanation

The root element has CDs as sub-elements. Since `display_cd` expects a single CD record, we iterate through the root and pass each child to `display_cd`.

Answer

- Display all British CDs.

In [19]:
british_cds = cds.findall("CD[COUNTRY='UK']")
for bcd in british_cds:
  display_cd(bcd)


TITLE: Hide your heart, ARTIST: Bonnie Tyler, COUNTRY: UK, COMPANY: CBS Records, PRICE: 9.90, YEAR: 1988
TITLE: Still got the blues, ARTIST: Gary Moore, COUNTRY: UK, COMPANY: Virgin records, PRICE: 10.20, YEAR: 1990
TITLE: One night only, ARTIST: Bee Gees, COUNTRY: UK, COMPANY: Polydor, PRICE: 10.90, YEAR: 1998
TITLE: Sylvias Mother, ARTIST: Dr.Hook, COUNTRY: UK, COMPANY: CBS, PRICE: 8.10, YEAR: 1973
TITLE: Maggie May, ARTIST: Rod Stewart, COUNTRY: UK, COMPANY: Pickwick, PRICE: 8.50, YEAR: 1990
TITLE: For the good times, ARTIST: Kenny Rogers, COUNTRY: UK, COMPANY: Mucik Master, PRICE: 8.70, YEAR: 1995
TITLE: Tupelo Honey, ARTIST: Van Morrison, COUNTRY: UK, COMPANY: Polydor, PRICE: 8.20, YEAR: 1971
TITLE: The very best of, ARTIST: Cat Stevens, COUNTRY: UK, COMPANY: Island, PRICE: 8.90, YEAR: 1990
TITLE: Stop, ARTIST: Sam Brown, COUNTRY: UK, COMPANY: A and M, PRICE: 8.90, YEAR: 1988
TITLE: Bridge of Spies, ARTIST: T'Pau, COUNTRY: UK, COMPANY: Siren, PRICE: 7.90, YEAR: 1987
TITLE: Private

### Explanation

This code uses XPath to find all British CDs. It does this by selecting all `CD` tags which have a sub-tag `COUNTRY` with the text value `UK`.

Reference: [XPath section of the ElementTree docs](https://docs.python.org/3/library/xml.etree.elementtree.html#xpath-support)

Answer

# Exercice 3 - Analyze JSON

- Write a Python program that gets the file of filming locations in Paris at: 

In [20]:
url = "https://opendata.paris.fr/explore/dataset/lieux-de-tournage-a-paris/download/?format=json&timezone=Europe/Berlin&lang=fr"

- How many entries have you got?

In [21]:
import json
from pathlib import Path

def download(source_url, dest_file):
  s = Session()
  s.verify = False
  r = s.get(source_url, stream=True)
  dest_file = Path(dest_file)

  with open(dest_file, 'wb') as f:
    for chunk in r.iter_content(chunk_size=8192):
      if chunk:
        f.write(chunk)

FN = 'tournage.json'
download(url, FN)

with open(FN) as f:
  locs = json.load(f)

print('Entry count:', len(locs))



Entry count: 12265


### Explanation

This code makes use of the sample `download` function from the slides. The JSON file is downloaded to `tournage.json`, which is then re-opened to analyze. Since there is an array at the root, `len` is simply called on the loaded JSON to get the entry count.

Answer

- Analyze the JSON file: what is its structure?
- Write a function that converts an entry in a string that shows director, title, district, start date, end date, and geographic coordinates.
- Convert all entries in strings (warning: some entries may have issues).
- Display the first 20 entries.

Answer

In [22]:
def display_loc(entry):
    fields = entry['fields']
    director = fields.get('nom_realisateur', '<director missing>')
    title = fields.get('nom_tournage', '<title missing>')
    district = fields.get('ardt_lieu', '<district missing>')
    start_date = fields.get('date_debut', '<start date missing>')
    end_date = fields.get('date_fin', '<end date missing>')
    coord_x = fields.get('coord_x', '<x coordinate missing>')
    coord_y = fields.get('coord_y', '<y coordinate missing>')

    return f"{director}'s \"{title},\" filmed in {district} ({coord_x}, {coord_y}) from {start_date} to {end_date}"

### Explanation

Metadata for each entry is stored in the `'fields'` key, however there may be missing fields for each entry. To safeguard for this, `dict.get` is used to give a default value in the case of a missing key.

### File structure

The JSON structure is an array of entries. The following is a formatted entry, to give an example of real data:

```json
{
   "datasetid":"lieux-de-tournage-a-paris",
   "recordid":"0ff321c5b140a12a8e50a1b212a7c5f5bced91d7",
   "fields":{
      "coord_x":2.37006242,
      "id_lieu":"2017-751",
      "adresse_lieu":"rue du faubourg du temple, 75011 paris",
      "geo_shape":{
         "coordinates":[
            2.370062415669748,
            48.8696979988026
         ],
         "type":"Point"
      },
      "coord_y":48.869698,
      "ardt_lieu":"75011",
      "nom_tournage":"2 Fils (Nouvelle Demande Décor Librairie / Journées interverties)",
      "nom_realisateur":"Félix MOATI",
      "date_debut":"2017-10-19",
      "type_tournage":"Long métrage",
      "annee_tournage":"2017",
      "nom_producteur":"NORD OUEST FILMS",
      "date_fin":"2017-10-19",
      "geo_point_2d":[
         48.8696979988026,
         2.370062415669748
      ]
   },
   "geometry":{
      "type":"Point",
      "coordinates":[
         2.370062415669748,
         48.8696979988026
      ]
   },
   "record_timestamp":"2024-01-31T13:40:46.402+01:00"
}
```

Each entry may be missing specific keys from `"fields"`. 

In [23]:
all_entries = [display_loc(e) for e in locs]
print('\n'.join(all_entries[:20]))

ANNE FONTAINE's "POLICE," filmed in 75012 (2.39934074, 48.83798025) from 2019-03-08 to 2019-03-09
Eli Ben-David's "L'Attaché," filmed in 75018 (2.34443461, 48.88730126) from 2019-03-14 to 2019-03-14
Marc RECUENCO's "En attendant qui ? Mai," filmed in 75017 (2.30595278, 48.8835646) from 2019-06-11 to 2019-06-11
JEAN PASCAL ZADI ET JOHN WAXXX's "TOUT SIMPLEMENT NOIR," filmed in 75005 (2.35024547, 48.84859142) from 2019-05-23 to 2019-05-23
Nicolas Herdt's "Une famille formidable," filmed in 75003 (2.36365029, 48.8602504) from 2018-08-06 to 2018-08-06
Nicolas Herdt's "Une famille formidable," filmed in 75003 (2.3621555, 48.86295435) from 2018-08-06 to 2018-08-06
Maïmouna Doucouré's "Les Mignonnes," filmed in 75019 (2.38208807, 48.88213499) from 2018-08-07 to 2018-08-07
CHRISTOPHE BARRAUD's "LEBOWITZ CONTRE LEBOWITZ/9 A 12," filmed in 75013 (2.359355, 48.838779) from 2016-11-09 to 2016-11-09
NICOLAS HERDT's "LEO MATTEI/14 ET 15," filmed in 75004 (2.365669, 48.84726) from 2016-10-06 to 2016-

- A same movie can have multiple shooting locations. Make a list of movies, where each entry contains the movie title, its director, and shootings locations (district, start date, end date).
- How many movies do you have?
- Write a function that converts a movie into a string that shows director, title, and shootings.
- Convert all movies in strings.
- Display the first 20 entries.

Answer

In [24]:
from typing import Dict, TypeVar, List

Movie = TypeVar('Movie')
movies: Dict[str, Movie] = dict()

for loc in locs:
  title = loc['fields']['nom_tournage']
  if title not in movies:
    movies[title] = {
      'title': title,
      'director': loc['fields'].get('nom_realisateur', '<director missing>'),
      'shootings': []
    }
  movies[title]['shootings'].append({
    'district': loc['fields'].get('ardt_lieu', '<arrondissement missing>'),
    'start_date': loc['fields']['date_debut'],
    'end_date': loc['fields']['date_fin']
  })

# Regroup locations per movie
movies: List[Movie] = [m for m in movies.values()]

### Explanation

The question asks for two tasks to be accomplished:
1. Entries are grouped by which movie they are a part of
2. A subset of fields is displayed from each movie, including the newly aggregated field of shooting locations

The most straightforward way to create this aggregation is via a dictionary. The movie title is chosen as the key, as there are no better unique identifier fields referencing the movie itself. 

While this organization is being done, the opportunity is taken to normalize the data into a new structure containing exactly what we need, and with no fields missing:

```json
A Movie is a dictionary with the schema:

{
  "title": "string",
  "director": "string",
  "shootings": [
    {
      "district": "string",
      "start_date": "string",
      "end_date": "string"
    },
    ...
  ]
}
```

Since the top-level dictionary was only needed for the process of organization, and not for the final data representation, we re-organize all of its values into a list for the final `movies` variable.

In [25]:
len(movies)

1476

In [26]:
def display_movie(movie):
    movie_str = f"{movie['director']}'s \"{movie['title']},\" was filmed in the following locations:\n"
    for shooting in movie['shootings']:
        movie_str += f'- {shooting['district']} between {shooting['start_date']} and {shooting['end_date']}\n'
    return movie_str

In [63]:
all_movie_displays = [display_movie(m) for m in movies]
print('\n'.join(all_movie_displays[:20]))

ANNE FONTAINE's "POLICE," was filmed in the following locations:
- 75012 between 2019-03-08 and 2019-03-09
- 75012 between 2019-03-08 and 2019-03-09
- 75012 between 2019-04-10 and 2019-04-11
- 75012 between 2019-03-11 and 2019-03-12
- 75020 between 2019-03-27 and 2019-03-27
- 75012 between 2019-03-07 and 2019-03-08
- 75011 between 2019-03-27 and 2019-03-27
- 75019 between 2019-03-28 and 2019-03-28
- 75012 between 2019-03-25 and 2019-03-25
- 75012 between 2019-03-28 and 2019-03-28
- 75019 between 2019-04-08 and 2019-04-09

Eli Ben-David's "L'Attaché," was filmed in the following locations:
- 75018 between 2019-03-14 and 2019-03-14
- 75018 between 2019-03-14 and 2019-03-14
- 75005 between 2019-03-15 and 2019-03-15
- 75012 between 2019-03-12 and 2019-03-12
- 75009 between 2019-03-12 and 2019-03-12
- 75001 between 2019-03-12 and 2019-03-12
- 75004 between 2019-03-20 and 2019-03-20
- 75001 between 2019-03-15 and 2019-03-16
- 75004 between 2019-03-16 and 2019-03-16
- 75005 between 2019-03-12

- Display for each district its number of shootings. 

Answer

In [27]:
from typing import Dict

def district_count_reducer(acc: Dict[str, int], movie: Movie) -> Dict[str, int]:
  for shooting in movie['shootings']:
    d = shooting['district']
    if d not in acc:
      acc[d] = 0
    acc[d] += 1
  return acc

stats = functools.reduce(district_count_reducer, movies, dict())

stats

{'75012': 596,
 '75020': 587,
 '75011': 641,
 '75019': 745,
 '75018': 1043,
 '75005': 640,
 '75009': 642,
 '75001': 722,
 '75004': 670,
 '75010': 749,
 '75007': 657,
 '75003': 236,
 '75017': 378,
 '75013': 658,
 '75008': 798,
 '75015': 363,
 '75002': 297,
 '75006': 471,
 '75116': 421,
 '75016': 614,
 '75014': 321,
 '94320': 4,
 '<arrondissement missing>': 1,
 '93500': 6,
 '93320': 1,
 '92220': 1,
 '92170': 1,
 '93200': 1,
 '93000': 1}

### Explanation

This exercise asks to transform an array of `Movie`s to a hash mapping a piece of information within a `Movie` to an integer counting occurrences. This is a prime use-case for `reduce`, as we are changing the data type.

We initialize a `reduce` call on `movies` with a function, and an initial value of an empty dictionary. The pieces of data we need to count for each `Movie` is in the `'shootings'` key, which is an array. Therefore we loop, and increment the accumulator key corresponding to the information we care about (`'district'`) for each shooting location.

# Exercice 4 - Analyze CSV

- Write a Python code retrieves the file of the most loaned titles in libraries in Paris at: 

In [28]:
url = "https://opendata.paris.fr/explore/dataset/les-titres-les-plus-pretes/download/?format=csv&timezone=Europe/Berlin&lang=en&use_labels_for_header=true&csv_separator=%3B"

Answer

In [29]:
from requests import get, Session
from io import StringIO
import csv
s = Session()
data = s.get(url).text

### Explanlation

This code retrieves CSV data from the provided URL using the `requests` library within a `session`, which handles persistent connections. After the data is fetched, it is stored as a csv string in the `data` variable. 

- Analyze the resulting CSV file to display, for all entries: title, author, and total number of loans.

Answer

In [34]:
print(data[:600])
print(f'total len of data: {len(data)}')

books = [] # Save all retrieved data
with StringIO(data) as csvfile:
    r = csv.reader(csvfile, delimiter=';')
    for i, row in enumerate(r):
        if i == 0: 
            # ignore first row: column names
            continue
        book = {
        "Type": row[0],
        "Loans": int(row[1]),
        "Title": row[2],
        "Author": row[3],
        "Area": row[4],
        "Total_Loans": int(row[5]),
        "Total_Copies": int(row[6])
        }
        books.append(book)

print(f'total entries: {len(books)}')

def disp_book(book):
    title = book['Title']
    author = book['Author']
    loans = book['Total_Loans']
    return f'"{title}", by {author} ({loans} loans)'

Type de document;Prêts 2022;Titre;Auteur;Nombre de localisations;Nombre de prêt total;Nombre d'exemplaires
Bande dessinée jeunesse;1064;Razzia;Sobral,  Patrick;47;2938;67
Bande dessinée jeunesse;1024;Touche pas à mon veau;Guibert,  Emmanuel;45;2296;71
Bande dessinée jeunesse;1016;Max et Lili vont chez papy et mamie;Saint-Mars,  Dominique de;50;5554;103
Bande dessinée jeunesse;938;Lili veut un petit chat;Saint-Mars,  Dominique de;51;5789;80
Bande dessinée jeunesse;921;Max et Lili font du camping;Saint-Mars,  Dominique de;52;5658;83
Bande dessinée jeunesse;901;Lili trouve sa maîtresse méch
total len of data: 66061
total entries: 842


### Explanlation

To inspect part of the data, the first 600 characters are printed along with the total length of the dataset. There are 7 columns shown in header: Type de document;Prêts 2022;Titre;Auteur;Nombre de localisations;Nombre de prêt total;Nombre d'exemplaires. 

Next, the CSV data is parsed using Python's csv.reader, and a list of dictionaries (`books`) is created to store the processed information. Each row (after the header) is treated as a separate book record. Since the delimiter is **';'** but not **','**, I searched the `csv.reader` method's documentation(https://docs.python.org/3/library/csv.html#csv-fmt-params) and found that `delimiter` parameter is used to specify customized delimiter.

The first row, which contains the column headers, is skipped using `if i == 0`. Then, for each subsequent row, a dictionary is created with the following keys:

- Type: Type de document
- Loans: Prêts 2022 
- Title: Titre
- Author: Auteur
- Area: Nombre de localisations
- Total_Loans: Nombre de prêt total
- Total_Copies: Nombre d'exemplaires
  
Each of these dictionaries is appended to the `books` list. Finally, the total number of books processed is printed using len(books).

Since each entry corresponds to a dict element stored in the `books` list, displaying the title, author, and total number of loans is just looking up the relevant keys in each dict.

In [35]:
print('\n'.join( [disp_book(b) for b in books[:20]]))

"Razzia", by Sobral,  Patrick (2938 loans)
"Touche pas à mon veau", by Guibert,  Emmanuel (2296 loans)
"Max et Lili vont chez papy et mamie", by Saint-Mars,  Dominique de (5554 loans)
"Lili veut un petit chat", by Saint-Mars,  Dominique de (5789 loans)
"Max et Lili font du camping", by Saint-Mars,  Dominique de (5658 loans)
"Lili trouve sa maîtresse méchante", by Saint-Mars,  Dominique de (4694 loans)
"J'irai où tu iras", by Lyfoung,  Patricia (4707 loans)
"Les nerfs à vif", by Nob (2837 loans)
"Je crois que je t'aime", by Lyfoung,  Patricia (3878 loans)
"Attention tornade", by Cazenove,  Christophe (2366 loans)
"Max et Lili se posent des questions sur Dieu", by Saint-Mars,  Dominique de (4823 loans)
"Game over. 13. Toxic affair", by Midam (2652 loans)
"Les Schtroumpfs et la tempête blanche", by Jost,  Alain (975 loans)
"On a marché sur la lune", by Hergé (5674 loans)
"Astérix chez les Bretons", by Goscinny,  René (3014 loans)
"Parvati", by Ogaki,  Philippe (2616 loans)
"Les Schtroumpf

- Display for each type of document (there can be several entries for the same type of document), the total number of loans for this type. 

Answer

In [32]:
stats = {}
for b in books:
    if b['Type'] not in stats:
        stats[b['Type']] = b['Total_Loans']
    else:
        stats[b['Type']] += b['Total_Loans']
stats

{'Bande dessinée jeunesse': 2300143,
 'Livre adulte': 41731,
 'Bande dessinée adulte': 59726,
 'Livre sonore jeunesse': 10630,
 'Livre jeunesse': 104067,
 'Bande dessinée ado': 29819,
 'DVD jeunesse': 2471,
 'Jeux vidéos tous publics Non prêtables': 4235,
 'Jeux de société prêtable': 10057,
 'Musique jeunesse': 4792,
 'Jeux de société': 1753}

### Explanlation

This code calculates the total number of loans for each document type. It loops through the `books` list and checks if the document type (`b['Type']`) is already in the `stats` dictionary. If not, it adds the type and sets the total loans to the current book’s `Total_Loans`. If the type is already in `stats`, it adds the current book's loans to the existing total. After the loop, `stats` contains the total number of loans for each document type.

- Display titles in order of profitability (in descending order of the number of loans per copy).

In [33]:
def disp_book(book):
    title = book['Title']
    author = book['Author']
    loans = book['Total_Loans']
    copies = book['Total_Copies']
    if author:
        return f'"{title}", by {author} ({loans} loans, {copies} copies)'
    else: 
        return f'"{title}" ({loans} loans, {copies} copies)'
    

for b in books:
    b['Profitability'] = b['Total_Loans'] / b['Total_Copies']

sorted_books = sorted(books, key=lambda x: x['Profitability'], reverse=True)
print('\n'.join( [disp_book(b) for b in sorted_books[:20]]))

"Console Nintendo Switch" (1648 loans, 2 copies)
"Console PlayStation 4" (2587 loans, 6 copies)
"SOS ouistiti :" (1868 loans, 5 copies)
"Quatre en ligne :" (1753 loans, 5 copies)
"Perplexus : : original" (2254 loans, 8 copies)
"Un enfant chez les schtroumpfs", by Díaz Vizoso,  Miguel (4504 loans, 43 copies)
"Mon meilleur ami", by Verron,  Laurent (4662 loans, 47 copies)
"Les vacances infernales", by Cohen,  Jacqueline (5014 loans, 51 copies)
"Bande de sauvages !", by Cohen,  Jacqueline (5761 loans, 60 copies)
"Trop, c'est trop !", by Cohen,  Jacqueline (4504 loans, 47 copies)
"Les fous du mercredi", by Cohen,  Jacqueline (5169 loans, 54 copies)
"Ca va chauffer !", by Cohen,  Jacqueline (4071 loans, 44 copies)
"Uno :" (3136 loans, 34 copies)
"Ca roule !", by Cohen,  Jacqueline (5763 loans, 63 copies)
"Salut, les zinzins !", by Cohen,  Jacqueline (4565 loans, 50 copies)
"Les deux terreurs", by Cohen,  Jacqueline (3999 loans, 44 copies)
"Subliiiimes !", by Cohen,  Jacqueline (5007 loans, 

### Explanation

In this code, the `disp_book` function has been slightly modified to handle books that don't have an author. The function now displays the title, total loans, and total copies, and if the book has an author, it includes the author in the output. If the author is missing, it only displays the title, loans, and copies.

The books are sorted by a new field called `Profitability`, which is calculated as the ratio of `Total_Loans` to `Total_Copies` for each book. The books with higher profitability are ranked higher.

The `sorted` function, which was found through a search on Stack Overflow (https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value), is used to sort the `books` list in descending order of profitability. `key` parameter specifies a function to be called on each element before sorting. In this case, we use a lambda function: `lambda x: x['Profitability']`. This lambda function takes a book (x) and returns its Profitability value, which is used as the sorting criterion. Then `reverse=True` is used to sort the books in descending order, meaning the books with the highest profitability will appear first.

Finally, the top 20 most profitable books are displayed using the `disp_book` function.

# Exercice 5 * - Analyze HTML

- Write a Python program that gets the content of the Wikipedia page at: 

In [35]:
url = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population_density"

Answer

- Display all the countries mentioned in the table. 

Answer

In [None]:
countries

- Display for each country its rank, density, population, area. 

Answer

- Save the information obtained in a Python dictionary. 

Answer

- Using the previously saved Python dictionary, ask the user for a country, display the 
corresponding information.

Answer

# Exercice 6 * - API Web

- Write a Python program that will make available a Web API allowing elementary calculations on 
integers.

The APIs are accessible by GET and in the form: 
- /add/{integer1}/{integer2}: add integer1 and integer2
- /sub/{integer1}/{integer2}: perform the subtraction of integer1 and integer2
- /mul/{integer1}/{integer2}: carry out the multiplication of integer1 and integer2
- /div/{integer1}/{integer2}: perform the integer division of integer1 by integer2
- /mod/{integer1}/{integer2}: perform the remainder of the integer division of integer1
by integer2

Answer

In [None]:
app.run(host='localhost', port=8080)

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://localhost:8080
Press CTRL+C to quit
127.0.0.1 - - [11/Oct/2024 09:03:41] "GET /mod/42/8 HTTP/1.1" 200 -


http://localhost:8080/mul/6/7

http://localhost:8080/div/42/8

http://localhost:8080/mod/42/8

- Write a Python program that will test the web API made available through the requests
library. 

Answer