In [1]:
# Always run this cell first it will import the 3 libs that you need for the exercises
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# **Numpy**

Exercises 1 through 8

## Data Types

### Exercise 1

- Create a 5x5 [identity matrix](https://en.wikipedia.org/wiki/Identity_matrix).
- Convert it to `int64` dtype.
- Confirm its properties such as dimensionality, shape and data type.

## Arithmetic Operators

### Exercise 2
Generate a vector of size 10 with values ranging from 0 to 0.9, both included.

### Exercise 3
Change the value of `c` so that the following code runs properly.

_Hint: What happens when you obtain `a*b`? What are the dimensions of this object? Use .dim, and comment out some lines, to play with this code and figure this out._

In [None]:
a = np.asarray([[3], [7], [878], [26]])
b = np.asarray([1, 10, 11, 101, 110])
c = np.asarray([0, 1, 2, 3, 4, 5]) # Change this variable

(a*b) + c

## Indexing and Slicing

### Exercise 4
Create a 4 by 4 2D array with 1s on the border and 0s inside

## Whole array Functions

### Exercise 5
Create a random vector of size 30 and find its mean value

### Exercise 6

Subtract the mean of each column of a randomly generated matrix:

## Linear Algebra

### Exercise 7
Obtain the diagonal of a dot product of 2 random matrices

## File I/O


### Exercise 8
Read in the file `daily_gas_price.csv`, which lists the daily price of natural gas since 1997. Each row contains a date and a price, separated by a comma. Find the minimum, maximum, and mean gas price over the dataset.

_Hint: you will need to use the delimiter option in `np.genfromtxt` to specify that data is separated by commas. We will be discarding the date column..._

# ____________________________________________

# **Plotting**

Exercises 1 through 3

## My first plot

### Exercise 1

Use Numpy to generate 2 different sequences of random numbers of the same length. Then plot these against one another as a scatter diagram.

## Annotation and Customisation

### Exercise 2

The following code reads in quarterly data on marriages and civil partnerships in Scotland between 2008 and 2018 (obtained from [NRS](https://www.nrscotland.gov.uk/statistics-and-data)).

In [None]:
# Read data, skipping header
raw = np.genfromtxt("data/marriage_data.csv", delimiter=",", skip_header=1)

# Separate data (columns known in advance)
opp_sex_marriage = raw[:, 0] # Opposite Sex Mrriages
same_sex_marriage = raw[:, 1] # Same Sex Marriages
male_civil = raw[:, 2] # Male Civil Partnerships
female_civil = raw[:, 3] # Female Civil Partnerships

Plot a line chart showing the number of male civil partnerships, female civil partnerships, and same sex marriages over time. Use multiple calls to `plt.plot` to overlay these on the same figure. Give your plot a legend, a title, and x- and y-axis labels.

_Extra Challenge:_ Try changing the x-axis ticks to mark the start of each year (remember that this is quarterly data, so you'll want xticks spaced at intervals of 4 - you may need `np.arange` more than once...)

## Figure Objects

### Exercise 3
Take a look at the file `my_graph.jpeg` produced by running this code block. Something is wrong: change the code block so that the output file is the same as the displayed graph.

In [None]:
x = np.arange(0.1, 10, 0.1)
plt.plot(x, np.log(x))
plt.show()
plt.savefig("my_graph.jpeg")

# ____________________________________________

# **Pandas**

Exercises 1 through 7

## Summarizing and computing descriptive stats 


### Exercise 1
A dataset of random numbers is created below. Obtain all rows starting from row 85 to 97.

*Note: Remember that Python uses 0-based indexing*

In [None]:
df = pd.DataFrame(np.reshape(np.arange(10000), (100,100)))


### Exercise 2
Create a (3,3) DataFrame and square all elements in it.

### Exercise 3

A random DataFrame is created below. Find the median value of each column.

In [None]:
df = pd.DataFrame(np.random.uniform(0, 10, (100, 100)))


## Data Loading and Storing

### Exercise 4
Load the file `data/homes.csv` and find the mean selling price of these houses.

## Data Cleaning - Handling Missing Data

### Exercise 5
Remove the missing data below using the appropriate method

In [None]:
data = pd.Series([1, None, 3, 4, None, 6])
data

### Exercise 6
That's fine if we want to remove missing data, what if we want to fill in missing data? Do you know of a way? Try to fill in all of the missing values from the data below with **0s**

In [None]:
data = pd.DataFrame([[1., 6.5, 3.], [2., None, None],
                    [None, None, None], [None, 1.5, 9.]])
data

## Data Transformation

### Exercise 7
Let's load again our file with home prices and filter out homes based on our preference:
1. Load up the file `data/homes.csv`
2. The data contains some duplicates. Filter them out.
3. Let's say that the most we can spend on a house is £150. Keep only houses that have a **sell**ing price less than £150 and remove the rest
4. Select only houses that have 4 or more bedrooms
5. Select only houses that have 3 or more baths

You should end up with only 2 houses