In this part the following modules will be covered:

- file handling: `os json pickle pathlib`
- containter datatypes: `collections`
- iterators: `itertools`
- enumerators: `enum`
- regular expressions: `re`

## Part 2.2. Standard library

The task is the same: replace `...` (Ellipsis) symbols with suitable pieces of code. 

You can find all the files for this notebook in the `pt2.2_files` zip archive. Unzip it and place the folder in the same directory as the notebook.

In [None]:
FILES = 'pt2.2_files'

### `os`

You can use this module to access files and directories on your PC.

In [None]:
import os

`os.path.join`

This function enables you to join paths. This method is in all ways superior to concatenating path strings with `+` as it takes into account OS-specific symbols.

Let's create a path to the `1.json` file located in `json_files/` subdirectory in `FILES` directory

In [None]:
JSON_SUBDIR = 'json_files'; FILENAME = 'foijy.json'
bad_way = FILES + '/' + JSON_SUBDIR + '/' + FILENAME
good_way = os.path.join(FILES, JSON_SUBDIR, FILENAME)
# note: there's even a better way to do this with pathlib

print(bad_way, good_way, sep='\n')

`os.listdir`

Use this function to get a list of all files in a directory.

In [None]:
def all_json_files() -> list[str]:
    '''
    Return a list of all .json filenames from that `json_files/` directory.
    '''
    return [filename for filename in os.listdir(os.path.join(FILES, JSON_SUBDIR)) if filename.endswith('.json')]

In [None]:
# test
res = all_json_files()
assert res, 'Empty list of filenames'
assert all(el.endswith('.json') for el in res), 'Conststs non-json file formats'


### `pathlib`

An even better way to deal with paths. This modules allows for storing them now as strings, but as the `Path` objects.

In [None]:
from pathlib import Path

Use the intuitive division operator (`/`) to create child paths.

In [None]:
FILES_PL = Path('pt2.2_files'); JSON_SUBDIR = Path('json_files'); FILENAME = Path('foijy.json')

best_way = FILES_PL / JSON_SUBDIR / FILENAME
print(best_way)

Also get useful information about the file/folder at destination:

In [None]:
print('is directory (points to a folder):', best_way.is_dir())
print('is file:', best_way.is_file())
print('check if the file exists:', best_way.exists())
print('file\'s extension:', best_way.suffix)

We can even open the file with the `.open` method:

In [None]:
with best_way.open() as f:
    print(f.read())

You can use these `Path` objects in same way as the string paths. However, they come with added functionality. 

In [None]:
# return the absolute path
abs_path = best_way.absolute()
print(abs_path)

In [None]:
# return the contents of a directory matching a given pattern
json_folder = FILES_PL / JSON_SUBDIR
print('all files:', list(json_folder.glob(pattern='*.*'))) 
print('json only:', list(json_folder.glob(pattern='*.json'))) 

Further reading: https://docs.python.org/3/library/pathlib.html

### `json`

JSON format is widely used to serialize dictionary-like data.

In [None]:
import json

Read that `1.json` json file and return it as a `dict` object.

In [None]:
def simple_read_json() -> dict:
    '''
    Returns a dictionary generated from the contents of ./pt2.2_files/json_files/1.json.
    '''
    with open(os.path.join(FILES, JSON_SUBDIR, '1.json')) as f:
        return json.load(f)

In [None]:
# test
res = simple_read_json()
assert res['school_name'] == 'HSE'
assert res['reg_code'][:4] == 'dbf3'
assert len(res['grades']) == 26

Do the tests pass?

If yes, now use the `all_json_files` function to get all json filenames, open all of them one by one, create a summary in one `list` of `dict`s and save it to a file called `summary.json` to the `pt2.2_files/output/` directory.

The summary dictionary should have the following format:

```
summary = [
    {
        'school_name': ...,
        'reg_code_7': ...,
        'average_grade': ...
    },
    ...
]
```

That is, a list of dictionaries with the following keys: 
- `school_name`, 
- `reg_code_7` (first 7 charachters of the original `'reg_code'`)
- `average_grade` (arithmetic mean of the original `'grades'` list rounded to 3 decimal digits).

In [None]:
from statistics import mean # use this instead of sum(grades)/len(grades)

def create_summary_object() -> list[dict[str, str | float]]:
    '''
    Create a summary object as described above
    in a form of a list of dictionaries.

    Hint 1: use `all_json_files` to access the filenames.
    '''
    json_files = all_json_files()
    summary = []
    for jf in json_files:
        with open(os.path.join(FILES, JSON_SUBDIR, jf), 'r') as f:
            raw_data = json.load(f)
        summary.append(
            {
                'school_name': raw_data['school_name'],
                'reg_code_7': raw_data['reg_code'][:7],
                'average_grade': round(mean(raw_data['grades']), 3)
            }
        )
    return summary

In [None]:
# test
res = create_summary_object()
assert isinstance(res, list), 'result must be a list object'
assert isinstance(res[0], dict), 'each element must be a dict object'
assert len(res) == 5
assert {el['reg_code_7'] for el in res} == {'dbf3fd7', 'e5dcffe', 'c701f30', '51c670b', 'bfe3a1c'}, \
    'Something wrong with the reg_code_7 fields'

Now, save this object in the `summary.json` file.

In [None]:
SAVE_TO = os.path.join(FILES, 'output', 'summary.json')

with open(SAVE_TO, 'w') as f:
    json.dump(res, f)

In [None]:
# test
assert os.path.exists(SAVE_TO), f'{SAVE_TO} file does not exist'
with open(SAVE_TO) as f: assert res == json.load(f), 'Object saved and object created are not equal'

BONUS: reading json from urls

One can also read json data directly from a url address:

In [None]:
from urllib.request import urlopen

URL = 'https://global-warming.org/api/ocean-warming-api'
response = urlopen(URL)

data = json.loads(response.read())

In [None]:
print(data)
print(data['description'])
print(f'in total: {len(data["result"])} temperature measurements')

In [None]:
def create_years_anomalies() -> tuple[list[int], list[float]]:
    '''
    Extract useful measurements from the `data` dictionary stored under the `'result'` key
    as another dictionary.
    Return a tuple of (list of years as ints, list of temperatures as floats)
    '''
    years = []; anomalies = []
    for y, a in data['result'].items():
        years.append(int(y))
        anomalies.append(float(a))
    return years, anomalies

In [None]:
yrs, anom = create_years_anomalies()

In [None]:
# test
import math
assert isinstance(yrs[0], int), 'Elements of yrs must be integers'
assert isinstance(anom[0], float), 'Elements of anom must be floats'
assert yrs == list(range(1880, 2023)), 'Incorrect list of years'
assert math.isclose(sum(anom), 10.44), 'Something wrong with the anom elements'

If previous tests passed, run the second next cell to plot a graph of "temperature anomaly by year".

Make sure that you have the `matplotlib` module installed by running the next cell.

In [None]:
%pip install matplotlib

In [None]:
from matplotlib import pyplot as plt
from statistics import mean

# generating sliding average data
anom_sliding = []
win_size = 5
for i in range(win_size, len(anom) - win_size):
    anom_sliding.append(mean(anom[i-win_size:i+win_size]))

# plt.plot(yrs, [0] * len(yrs))
plt.plot(yrs, anom, ':.', label='data points')
plt.plot(yrs[win_size:-win_size], anom_sliding, label='sliding average')
plt.title('Temperature anomaly by year')
plt.xlabel('year')
plt.ylabel('anomaly, degrees Celsius')
plt.grid()
plt.legend()
plt.show()

### `pickle`

See the `pickle_pickle_pickle.py` skeleton.

### `csv`

CSV = comma separated values. It's a serialization method to store table-like objects.

First line contains names of the columns (normally, separated by commas). Next lines contain data corresponding to those columns. Take a look at the `pt2.2_files/csv_files/example.csv` file to get a better picture of how the data is stored.

In [None]:
import csv
CSV_SUBDIR = 'csv_files'

You can read a json file using the `csv.DictReader`. This will make each row a dictionary with column names as keys:

In [None]:
with open(os.path.join(FILES, CSV_SUBDIR, 'example.csv')) as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        print(row)

_Note_ that an empty value in the 4th row has been replaced with an empty string.

You probably won't need to use the `csv` module as is, because this format is supported by many side-packages (like `numpy`, `pandas`, etc.) where there exist separate functions designed to deal with `.csv` files.

For example, there is how easy it is to read a matrix in `numpy`:

In [None]:
# numpy is a side-package; make sure that it is installed on your PC
%pip install numpy

In [None]:
import numpy as np
MATRIX_FILEPATH = os.path.join(FILES, CSV_SUBDIR, 'matrix.csv')
matr = np.loadtxt(MATRIX_FILEPATH, dtype=int, delimiter=',')
print(matr)

### `enum`



### `itertools`



In [None]:
from itertools import combinations

def pairwise_products(a: list[int]) -> int:
    #? remember this problem from pt1?
    # now solve it using itertools.combinations
    # note: this will be less efficient than the linear solution.
    '''*
    For a list of integers find the sum of all pairwise products.
    It is guaranteed that len(a) >= 2.
    Ex.: [2,5,4] -> 2*5 + 2*4 + 5*4 = 38
    '''
    import math
    return sum(map(math.prod, combinations(a, 2)))

In [None]:
# test pairwise_products
assert pairwise_products([2,5,4]) == 38
assert pairwise_products([1,2,3,4,5,6]) == 175
assert pairwise_products([1,2,0]) == 2
assert pairwise_products([5,3]) == 15

### `collections`



### `re`

