# Python for Spatial Analysis
## Second part of the module of GG3209 Spatial Analysis with GIS.
### Notebook to practice NumPy and Pandas - Exercises

---
Dr Fernando Benitez -  University of St Andrews - School of Geography and Sustainable Development - First Iteration 2023 v.1.0 

# Practicing NumPy

## 2.0 
Import numpy under the alias `np`.

In [1]:
import numpy as np

## 2.1 

Create the following arrays:

* Create an array of 10 ones.

* Create an array of the integers 1 to 20.

* Create a 5 x 5 matrix of ones with a dtype int.

In [8]:
ones = np.ones((10))
arr1 = np.linspace(1, 20, 20)
matrix1 = np.ones((5, 5), dtype="int")

print(ones)
print(arr1)
print(matrix1)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
 19. 20.]
[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


## 2.2 
Use numpy to:
1. Create an 3D matrix of 3 x 3 x 3 full of random numbers drawn from a standard normal distribution (hint: `np.random.randn()`)
2. Reshape the above array into shape (27,)

In [None]:
matrix2 = np.random.randn(3, 3, 3)

# 2.3 

Create an array of 20 linearly spaced numbers between 1 and 10.

In [7]:
arr23 = np.linspace(1, 10, 20)
print(arr23)

[ 1.          1.47368421  1.94736842  2.42105263  2.89473684  3.36842105
  3.84210526  4.31578947  4.78947368  5.26315789  5.73684211  6.21052632
  6.68421053  7.15789474  7.63157895  8.10526316  8.57894737  9.05263158
  9.52631579 10.        ]


## 2.4

Run the following code to create an array of shape 4 x 4 and then use indexing to produce the outputs shown below.

In [12]:
import numpy as np
a = np.arange(1, 26).reshape(5, -1)
print(a)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]


In [15]:
a[3][-1]

20

In [23]:
a[1:,3:]

array([[ 9, 10],
       [14, 15],
       [19, 20],
       [24, 25]])

```python
array([[ 9, 10],
       [14, 15],
       [19, 20],
       [24, 25]])
```

In [21]:
a[1,:]

array([ 6,  7,  8,  9, 10])

```python
array([ 6,  7,  8,  9, 10])
```

In [None]:
# Code comes here

## 2.5 

Calculate the sum of all the numbers in `a`.

In [24]:
sum = np.sum(a)
print(sum)

325


## 2.6 

Calculate the sum of each row in `a`.

In [27]:
for row in a:
    print(np.sum(row))

for column in a:
    print(np.mean(column))

15
40
65
90
115
3.0
8.0
13.0
18.0
23.0


## 2.7 

Extract all values of `a` greater than the mean of `a` (hint: use a boolean mask).

In [35]:
mean = np.mean(a)
for i in a:
    if i > mean:
        print(i)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

# Practicing Pandas

## 2.8

In this set of practice exercises we'll be investigating the carbon footprint of different foods. 

We'll be leveraging a dataset compiled by [Kasia Kulma](https://r-tastic.co.uk/post/from-messy-to-tidy/) and contributed to [R's Tidy Tuesday project](https://github.com/rfordatascience/tidytuesday).

Start by importing pandas with the alias `pd`.

In [36]:
import pandas as pd

## 2.9 

The dataset we'll be working with has the following columns:

|column      |description |
|:-------------|:-----------|
|country       | Country Name |
|food_category | Food Category |
|consumption   | Consumption (kg/person/year) |
|co2_emmission | Co2 Emission (Kg CO2/person/year) |


Import the dataset as a dataframe named `df` from this url: <https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv>

In [37]:
df = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv       ", sep=",", header=0, encoding="ISO-8859-1")

## 2.10 

How many rows and columns are there in the dataframe?

In [39]:
print(len(df))
df.head(10)

1430


Unnamed: 0,country,food_category,consumption,co2_emmission
0,Argentina,Pork,10.51,37.2
1,Argentina,Poultry,38.66,41.53
2,Argentina,Beef,55.48,1712.0
3,Argentina,Lamb & Goat,1.56,54.63
4,Argentina,Fish,4.36,6.96
5,Argentina,Eggs,11.39,10.46
6,Argentina,Milk - inc. cheese,195.08,277.87
7,Argentina,Wheat and Wheat Products,103.11,19.66
8,Argentina,Rice,8.77,11.22
9,Argentina,Soybeans,0.0,0.0


## 2.11

What is the mean `co2_emission` of the whole dataset?

In [41]:
df["co2_emmission"].mean()

74.383993006993

## 2.12

What is the maximum `co2_emmission` in the dataset and which food type and country does it belong to?

In [42]:
df["co2_emmission"].max()

1712.0

## 2.13

How many countries produce more than 1000 Kg CO2/person/year for at least one food type?

In [46]:
high_co2 = df[df["co2_emmission"] > 1000]
high_co2.head(10)
print(len(high_co2))

5


## 2.14

Which country consumes the least amount of beef per person per year?


In [49]:
beef = df.query('food_category=="Beef"')
beef.head(10)
beef["consumption"].min()

0.7799999999999999

## 2.15

What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?

In [52]:
meat = df.query('food_category=="Pork" or food_category=="Poultry" or food_category=="Fish" or food_category=="Lamb" or food_category=="Goat" or food_category=="Beef"') 
meat.head(10)
meat["consumption"].sum()

8677.94

# 2.16 

What is the total emissions of all other (non-meat) products in the dataset combined?

## Well done!

That's all from this week, if you have any questions, do not forget to ask the TAs for assistance.