In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab04B.ipynb")

---

<h3><center>E7 -  Introduction to Programming for Scientists and Engineers</center></h3>

<h2><center>Lab session #04-B <br></center></h2>

<h1><center>Reading from and writing to files<br></center></h1>

---

In [None]:
from resources.hashutils import *
from pathlib import Path
import numpy as np
import csv

## Question 1: The `pathlib` module


## Question 1.1

Use the `pathlib` module to create the folder structure shown below. 

<img src="resources/dir_tree.png" width="150" />

Here "cwd" refers to the current working directory, returned by `Path.cwd()`. `d1` and `d2` are directories. `f1.txt`, `f1.csv`, and `f2.csv` are empty files. 

**Hints**:
+ Create a directory with [`Path.mkdir()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir)
+ Create a file with [`Path.touch()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.touch)
+ Make sure that a Path does not exist (using [`Path.exists()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.exists)) before calling `mkdir()` or `touch()`.

In [None]:
...

In [None]:
grader.check("q1p1")

## Question 1.2

Write a function called `count_file_types` that takes as input a folder (a `Path` object), and returns a dictionary with the number of files of each type in the folder. That is, if the folder contains 2 txt files and 3 pdf files, then the returned dictionary will be
```python
{'.txt':2, '.pdf':3}
```
**Hints**:
+ Use `for p in folder.iterdir()` to iterate through the items in the folder. 
+ Use `is_file()` to check that an item is a file (and not a subdirectory).
+ Get a file's extension with `p.suffix`.
+ Check whether a string `a` is a key in a dictionary `A` with `a in A.keys()`.

In [None]:
def count_file_types(directory):
    pass

In [None]:
# Test your code on the resources folder for this lab.
# This folder contains 1 .py file, 2 .csv files, and 3 .png files.
count_file_types(Path.cwd()/'resources')

In [None]:
grader.check("q1p2")

## Question 2: The `csv` module

In this problem we will use the `csv` module to extract information from a file containing a list of the 1,000 largest cities in the world. Loading the file into a spreadsheet program (like Excel) we can see the first few rows of the table:

<img src="resources/cities.png" width="600" />

The first line of the file is the header line -- we will skip that line when processing the file. The columns of the table are the city's name, its lat/lng coordinates, its country, and population.

Our task is a little bit strange. We are given an integer `x` and we must find the number of cities whose population is a multiple of `x`. That is, for whom the poulation mod `x` equals zero. With `x=8` (spoiler alert!) the result turns out to be 315. That is, 315 out of the 1,000 largest cities have a population that is a multiple of 8. Tokyo is one of them, since 37,732,000 / 8 is a whole number. 

Implement this in a function called `population_is_multiple_of` that takes `x` as input, and returns an integer.

**Hint**
+ Use `next(csv_reader)` to skip the header line. 

In [None]:
def population_is_multiple_of(x):
    inputfile = Path.cwd()/'resources'/'1000cities.csv' 
    num_cities = 0    # counter for cities with population that is multiple of x
    with open(inputfile,...) as f:
        ...  # write code to iterate through the file, 
             # counting up num_cities
    return num_cities

In [None]:
# Use this to check your result. You should get 315
population_is_multiple_of(8)

In [None]:
grader.check("q2")

# Question 3: Reading and writing files with NumPy

In Question 2 we used the `csv` package because the data that we were loading contained both strings (cities and countries) and numbers (lat, lng, population). We can use the simpler method provided by NumPy when the data is purely numerical. This is what we do in this part.

## Question 3.1: Read a dataset using NumPy

The dataset concerns [smoke detection using ambient air sensors](https://www.kaggle.com/datasets/gauravduttakiit/sensorfusion-smoke-detection-classification), and it consists of measurements of temperature, humidity, pressure, and the presence of various gases and compounds. The data file is called `air_data.csv`; it is in the `resources` folder.  Here is a snapshot of the first few rows:

<img src="resources/air.png" width="1400" />

Notice that all of the columns are numerical, and hence this is a good candidate for NumPy. Notice also that the first row is a header. 

Your first task is simply to load the data (ignoring the header) into a single two-dimensional NumPy array called `air_data`. This can be accomplished with a single-line call to [`np.loadtxt`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html), by using the input parameters to a) set the delimiter to a comma, and b) skip the first row. Please consult the documentation for [`np.loadtxt`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) on how to do this.




In [None]:
inputfile = Path.cwd()/'resources'/'air_data.csv'
air_data = ...

In [None]:
grader.check("q3p1")

## Question 3.2

Next, we might want to make a calculation using the data, and then write the result to a NumPy binary file (.npy). Compute the average, a.k.a. the arithmetic mean, or simple the [mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) of each column in the `air_data`. Call this `air_data_mean`. `air_data_mean` should have 15 numbers -- one for each column of `air_data` -- and its shape should be `(15,)`. Then use [`np.save`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) to save `air_data_mean` to a NumPy binary file in your current working directory. Call this file `air_data_mean.npy`.

In [None]:
air_data_mean = ...
outfile = ...
np.save(...)

In [None]:
grader.check("q3p2")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Make sure you submit the .zip file to Gradescope.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)