# DS 3000/CS 3200 Lab 1

Due: Tuesday May 13 in-class

### Submission Instructions
Upload this `ipynb` file to GitHub, and then to Gradescope.  To ensure that your submitted `ipynb` file represents your latest code, make sure to give a fresh `Kernel > Restart & Run All` just before uploading the `ipynb` file to GitHub and verify that the correct version is represented on Gradescope.

### Tips for success
- Collaborate: bounce ideas off of each other, if you are having trouble you can ask your classmates or Dr. Gerber for help with specific issues, however...
- [(See)](http://www.northeastern.edu/osccr/academic-integrity), i.e. you are welcome to **talk about/discuss** (*not* show or allow each other to copy your answers to) the problems.

# Part 1 (40 points): Intro to Markdown

Use the markdown language below to create your own brief wikipedia-esque description of a topic related to International Government and Politics. 

Your mini-wiki page must include:
- three headers: a title, subtitle and subsubtitle (the #, ##, ### syntax)
- an embedded image from a web address (use an [image hosting site](https://makeawebsitehub.com/free-photo-hosting/) if you'd like to upload your own)
- a **markdown** table of size at least 9 cells (i.e. 3 by 3, or 5 by 2)
- a list in **markdown**
- a link to another website

To practice typing in math mode, also include a LaTeX formula describing how your final grade is going to be calculated based on the syllabus, something like:

$$grade = weight_1*score_1 + weight_2*score_2$$

Please be **brief** in your text.  Aim for roughly 3 sentences total of text.

# Climate Change in Government and Politics
## A global issue requires global collaboration among governments.

<img src="https://rollcall.com/app/uploads/2022/01/climate_BC_134_091819.jpg" width="400" height="200">

### What is Climate Change?
**Climate change** is the gradual warming of the planet due to greenhouse gas emissions caused by anthropogenic activity.

These include:
- fossil fuel burning
- deforestation

and more.

### How is it political?
To address climate change, governments must collaborate to devote resources to the problem; however, different countries are affected by climate change to differing extents and have different perspectives regarding the urgency of the threat. This often limits the action that is taken.

### What has been done?
| **Action**                  | **Description**                                                                                                                                       |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Paris Agreement             | A global treaty adopted in 2015, aiming to limit global temperature increase to well below 2Â°C.                                                       |
| EU Emissions Trading System | A market-based mechanism designed to reduce greenhouse gas emissions by setting a cap on the total amount of emissions allowed from specific sectors. |
| US Inflation Reduction Act  | Provides funding for energy security and climate change initiatives, including renewable energy tax credits.                                          |
| Kyoto Protocol              | An earlier, but now superseded, international agreement that set legally binding emissions reduction targets for developed countries.                 |

This [article](https://pmc.ncbi.nlm.nih.gov/articles/PMC7347262/) from the NIH details political events relating to climate change and public views on the subject.

$$grade = 0.4*homework + 0.2*labs + 0.2*quiz1 + 0.2*quiz2$$

# Part 2: Numpy
## Part 2.1: Creating Arrays (10 points)

Create the following two arrays using the NumPy library and then print them out. Call the first array `array_a` and the second array `array_b` (make sure you keep the `import` statement below):

$$\mathbf{array_a} = \begin{bmatrix}3 & 8 & -2 & 3\\
.5 & -1 & 6 & 4\\
-5 & 7 & -42 & 2
\end{bmatrix}$$

$$\mathbf{array_b} = \begin{bmatrix}42 & 38 & 34\\
30 & 26 & 22\\
18 & 14 & 10\\
6 & 2 & -2\\
-6 & -10 & -14
\end{bmatrix}$$

In [1]:
# make sure to import numpy library
import numpy as np

In [2]:
array_a = np.array([[3, 8, -2, 3],
                    [.5, -1, 6, 4],
                    [-5, 7, -42, 2]])

array_b = np.array([[42, 38, 34],
                    [30, 26, 22],
                    [18, 14, 10],
                    [6, 2, -2],
                    [-6, -10, -14]])

In [3]:
# uncomment below to print array_a
array_a

array([[  3. ,   8. ,  -2. ,   3. ],
       [  0.5,  -1. ,   6. ,   4. ],
       [ -5. ,   7. , -42. ,   2. ]])

In [4]:
# uncomment below to print array_b
array_b

array([[ 42,  38,  34],
       [ 30,  26,  22],
       [ 18,  14,  10],
       [  6,   2,  -2],
       [ -6, -10, -14]])

## Part 2.2: Exploring Arrays (15 points)

1. Give the shape, size, ndim, and nbytes for each of the two arrays.
1. Take the transpose of both arrays. Call these `t_array_a` and `t_array_b`.
1. Try to add `array_a` and `t_array_b` (*prove* and *show* you did this with commented out code), then remove the last column of `t_array_b` and try to add them again. In a markdown cell, explain what happened.

In [19]:
# shape, size, ndim, and nbytes for each array
print(f"Array A has a shape of {array_a.shape}, a size of {array_a.size}, {array_a.ndim} dimensions, and {array_a.nbytes} bytes.")
print(f"Array B has a shape of {array_b.shape}, a size of {array_b.size}, {array_b.ndim} dimensions, and {array_b.nbytes} bytes.")

# transpose of each array
t_array_a = array_a.T
t_array_b = array_b.T

# add array_a and t_array_b

# new_array = array_a + t_array_b

t_array_b = t_array_b[0:3, 0:4]
new_array = array_a + t_array_b
new_array




Array A has a shape of (3, 4), a size of 12, 2 dimensions, and 96 bytes.
Array B has a shape of (5, 3), a size of 15, 2 dimensions, and 120 bytes.


array([[ 45. ,  38. ,  16. ,   9. ],
       [ 38.5,  25. ,  20. ,   6. ],
       [ 29. ,  29. , -32. ,   0. ]])

The first addition of array_a and t_array_b did not work because they have different shapes: array_a has a shape of (3, 4), and t_array_b has a shape of (3, 5). 

The second addition works because after you remove the last column from t_array_b, they both have a shape of (3, 4).

# Part 3: Pandas
## Part 3.1: Reading in Data (5 points)

On Canvas is the `train_stations_europe.csv` file. It was adapted from [this Kaggle data set](https://www.kaggle.com/datasets/headsortails/train-stations-in-europe). Read this data set in, using the `id` as the index column, and print the first few rows of the data. Make sure you keep the `import` statement below!

In [20]:
# make sure to import pandas library
import pandas as pd

In [27]:
df = pd.read_csv("train_stations_europe.csv", index_col='id')
df.head()

Unnamed: 0_level_0,name,latitude,longitude,parent_station_id,country,time_zone,is_city,is_main_station,is_airport
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Chateau-Arnoux-St-Auban,44.08179,6.001625,,FR,Europe/Paris,True,False,False
2,Chateau-Arnoux-St-Auban,44.061565,5.997373,1.0,FR,Europe/Paris,False,True,False
3,Chateau-Arnoux Mairie,44.063863,6.011248,1.0,FR,Europe/Paris,False,False,False
4,Digne-les-Bains,44.35,6.35,,FR,Europe/Paris,True,False,False
6,Digne-les-Bains,44.08871,6.222982,4.0,FR,Europe/Paris,False,True,False


## Part 3.2: Manipulating Data (20 points)

1. Create a subset of the data set which (a) **includes** only train stations in Belgium and (b) **excludes** all train stations which are **not** in a city. Make sure to save this subset as a new data frame and print the first few rows of the data.
1. Use the `.describe()` function to produce summary statistics for the subset from the previous part. Create a markdown cell and explain:
    - What Series did the `.describe()` function run on? What Series did it not run on? What is the difference, and what does this mean the `.describe()` function is used for?

In [50]:
df_belgium = df.loc[(df.country == 'BE') & (df.is_city == True), :]
df_belgium.head()

Unnamed: 0_level_0,name,latitude,longitude,parent_station_id,country,time_zone,is_city,is_main_station,is_airport
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
5964,Antwerpen,51.221722,4.40586,,BE,Europe/Brussels,True,False,False
5970,Blandain Ville,50.617431,3.26374,,BE,Europe/Brussels,True,False,False
5974,Bruxelles,50.84652,4.351739,,BE,Europe/Brussels,True,False,False
6003,Quevy Ville,,,,BE,Europe/Brussels,True,False,False
6006,Sterpenich Ville,,,,BE,Europe/Brussels,True,False,False


In [51]:
df_belgium.describe()

Unnamed: 0,latitude,longitude,parent_station_id
count,9.0,9.0,0.0
mean,50.719715,4.810847,
std,0.36682,0.996916,
min,49.999779,3.26374,
25%,50.617431,4.351739,
50%,50.718089,4.444643,
75%,50.979756,5.701847,
max,51.221722,6.120155,


The .describe() function ran on latitude, longitude, and parent_station_id. It did not run on name, country, time_zone, is_city, is_main_station, or is_airport. This is because the .describe() function can only run on numerical values, not string or boolean data. This is because it is calculating summary statistics of these values, such as mean and standard deviation, which require numeric inputs.

# Part 4: GitHub (10 points)

When you finish the first three parts, give one last restart and run all to this file. Then, go to the [DS 3000 GitHub](https://github.com/eaegerber/ds3000_summer25), fork it and then clone it to your local machine. Then:

- In the forked and cloned ds3000_summer25 repo, make a new branch (`git checkout -b lab-upload`)
- Create a folder with your name (as Dr. Gerber did)
- Navigate to that folder and place this jupyter notebook inside
- Add the file as the change to be committed (`git add .\Lab1_MyName.ipynb`)
- Check `git status`, your changes should be staged
- Commit the changes with a short message (`git commit -m "My message"`)
- Push the repository to GitHub (`git push origin lab-upload`)
- Navigate the the GitHub in the browser and create a pull request
- Once the pull request is merged, you can delete your branch in the "Pull Requests" tab in the browser
- Finally, upload the Lab to Gradescope. When you do so, it will give you the option to do it via GitHub; do that.