# This notebook

In this notebook we show some basics of loading csv files, exploring dataframes, and saving results.

In [1]:
import os
import pandas as pd
import random

## Loading data

Startightforward with [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). Please check out the options in the documentation. 

In [2]:
data_dir = os.path.join('..', 'data', 'csv')
df = pd.read_csv(os.path.join(data_dir, 'oscar_winners.csv'), sep=';')

## Exploring data

In [3]:
df.head()

Unnamed: 0,year,movie
0,2019,Green Book
1,2018,The Shape of Water
2,2017,Moonlight
3,2016,Spotlight
4,2015,Birdman


In [4]:
df.shape

(30, 2)

In [5]:
df.describe()

Unnamed: 0,year
count,30.0
mean,2004.5
std,8.803408
min,1990.0
25%,1997.25
50%,2004.5
75%,2011.75
max,2019.0


## Manipulating data

As an example, we will add a mock column with floats corresponding to mock scores given to each movie. Then, we will only keep those rows with high scores.

In [6]:
# scores generated from 0.8 to 0.95, asumming a scale from 0 to 10 where 10 is the best possible score

df['score'] = [random.randrange(80, 95)/10 for _ in range(len(df))]

In [7]:
# keep only rows with scores greater than 9.0

df_short = df[df['score']>9]

In [8]:
df_short

Unnamed: 0,year,movie,score
0,2019,Green Book,9.1
2,2017,Moonlight,9.4
3,2016,Spotlight,9.3
6,2013,Argo,9.4
9,2010,The Hurt Locker,9.1
15,2004,The Lord of the Rings: The Return of the King,9.3
17,2002,A Beautiful Mind,9.2
23,1996,Braveheart,9.1
26,1993,Unforgiven,9.3
27,1992,The Silence of the Lambs,9.1


## Write the result to a csv file

In [9]:
df_short.to_csv(os.path.join(data_dir, 'movies_high_scores.csv'))