# Procedure

**Note: all example code will model my Scandanavian GDP growth DataFrame**

#### Before You Begin...
- *download* **Anaconda Navigator** is *ensure* the software is running smoothly
- *open* **Jupyter Labs** and familiarize yourself with the software
- *locate* data repository sources
    - see [awesome data sets](https://github.com/awesomedata/awesome-public-datasets#) for ideas
    
 Note- **bolded** text signifies importance while *italicized* text signifies actionable items/steps
 > blockquoted text is **_copyable code_**

## Table of Contents
1) *Choosing* the Data
2) *Creating* Your Notebook
3) *Manipulating* Your Data
4) *Computing* Percent Change
5) *Downloading* New Data

### Choosing the Data
1. *Browse* various data repositories
    - click here for some [awesome data sets](https://github.com/awesomedata/awesome-public-datasets#)
    - your data should be in a **.csv** file format
2. *Download* the selected data set to your computer
3. *Open* **Jupyter Labs** from **Anaconda Navigator** 
4. *Upload* your .csv file to **Jupyter Labs** 
    - click the **upload icon** next to the refresh icon at the top of the sidebar
    - *locate* and *select* your downloaded data set

### Creating Your Notebook
1. *Click* the large "**+**" icon at the top of the sidebar
2. *Select* "**Python 3**" under the "**Notebook**" tab
    - this should open a new **ipykernel**

### Manipulating Your Data
1. *Import* the .csv file with the following code:
 > `import pandas as pd GDPcap = pd.read_csv (r'BCL_GDP.csv') print(GDPcap)`

2. *Reduce* your data to the first 5 rows with the "head" function
     > `df.head()`
 3. *Isolate* specific columns
     >`list(df.columns)`
     - *select* which columns to keep based off properties
         - i.e. Scandanavian nations
 4. *Keep* the selected columns in the data table 
     >`keep=['Year','DNK','FIN','NOR','SWE','USA','Euro Area'] GDPscand=GDPcap.loc[110:130, keep].copy()`
     - *type* "**df**" to see your new data table

    > `df`

### Computing Percent Change
**Formula**: C=$\frac{x_2 - x_1}{x_1} $
1. *Code* to find the percent change between two data points in a column:

   >`def noComma(num): ans = "" for i in num:if i != ",": ans+= i return int(ans)`
   
   > `def percentChange(val1, val2): val2 = noComma(val2) val1 = noComma(val1)return (int(val2)/int(val1) - 1)*100`
    
    > `print(percentChange(df["FIN"][110], df["FIN"][129]))`

2. *Repeat* for each **KEPT** column
- should be the countries defined as "Scandanavian"
3. *Add* a new row displaying the percent change information


    > `#create dataframe<br> df_marks = pd.DataFrame(GDPscand)<br> print('Original DataFrame\n------------------')<br> print(df_marks)<br>`

   >`new_row = {'Year':'% Change', 'DNK':16.62, 'FIN':22.20, 'NOR':13.35, 'SWE':28.99, 'USA':24.98, 'Euro Area':16.61}<br>
   #append row to the dataframe<br>
   df_marks = df_marks.append(new_row, ignore_index=True)`

   >`print('\n\nNew row added to DataFrame\n--------------------------')<br>
   print(df_marks)`

### Downloading the New Data
1. *Export* your new DataFrame subset as a .csv file to make it accessible and shareable
    >`df_marks.to_csv('NewData.csv')`
2. Congrats, you're done!

## Code in Action
See the consecutive version of my coding and data manipulation strategies **below**

In [27]:
import pandas as pd 
GDPcap = pd.read_csv (r'BCL_GDP.csv') 
print(GDPcap)

     Year     AUS     AUT     BEL     CAN     CHE     CHL     DEU     DNK  \
0    1890   8,774   4,832   4,593   4,521   8,291   2,124   5,917   4,596   
1    1891   9,185   4,966   4,532   4,579   7,784   2,269   5,886   4,655   
2    1892   7,864   5,045   4,578   4,507   8,357   2,197   5,909   4,732   
3    1893   7,298   5,038   4,579   4,436   8,584   2,279   6,275   4,789   
4    1894   7,411   5,283   4,579   4,601   8,286   2,215   6,405   4,841   
..    ...     ...     ...     ...     ...     ...     ...     ...     ...   
125  2015  48,509  48,980  42,817  46,779  68,523  23,702  48,500  47,156   
126  2016  48,887  49,463  43,230  46,715  68,946  23,790  49,183  48,309   
127  2017  49,484  50,337  43,884  47,624  69,535  23,732  50,274  48,976   
128  2018  49,654  51,304  44,334  47,909  70,923  24,329  50,759  49,899   
129  2019  49,782  51,917  44,734  48,014  71,077  24,296  50,902  50,851   

        ESP  ...     JPN     MEX     NLD     NOR     NZL     PRT     SWE  \

In [29]:
df.head()

Unnamed: 0,Year,DNK,FIN,NOR,SWE,USA,Euro Area
110,2000,43603,36652,63746,37129,48689,38128
111,2001,43805,37523,64740,37566,48691,38803
112,2002,43869,38072,65323,38267,49081,38944
113,2003,43920,38742,65532,39005,50054,38961
114,2004,44976,40172,67732,40537,51476,39594


In [6]:
list(GDPcap.columns)

['Year',
 'AUS',
 'AUT',
 'BEL',
 'CAN',
 'CHE',
 'CHL',
 'DEU',
 'DNK',
 'ESP',
 'FIN',
 'FRA',
 'GBR',
 'GRC',
 'IRL',
 'ITA',
 'JPN',
 'MEX',
 'NLD',
 'NOR',
 'NZL',
 'PRT',
 'SWE',
 'USA',
 'Unnamed: 24',
 'Euro Area']

In [7]:
keep=['Year',
 'DNK',
 'FIN',
 'NOR',
 'SWE',
 'USA',
 'Euro Area']
GDPscand=GDPcap.loc[110:130, keep].copy()

In [8]:
GDPscand

Unnamed: 0,Year,DNK,FIN,NOR,SWE,USA,Euro Area
110,2000,43603,36652,63746,37129,48689,38128
111,2001,43805,37523,64740,37566,48691,38803
112,2002,43869,38072,65323,38267,49081,38944
113,2003,43920,38742,65532,39005,50054,38961
114,2004,44976,40172,67732,40537,51476,39594
115,2005,45900,41147,69039,41529,52796,39999
116,2006,47539,42641,70128,43222,53782,41051
117,2007,47759,44710,71485,44378,54273,41987
118,2008,47236,44851,70936,43835,53688,41923
119,2009,44679,41033,68837,41577,51870,39924


In [16]:
df=GDPscand
df.shape

(20, 7)

In [17]:
df

Unnamed: 0,Year,DNK,FIN,NOR,SWE,USA,Euro Area
110,2000,43603,36652,63746,37129,48689,38128
111,2001,43805,37523,64740,37566,48691,38803
112,2002,43869,38072,65323,38267,49081,38944
113,2003,43920,38742,65532,39005,50054,38961
114,2004,44976,40172,67732,40537,51476,39594
115,2005,45900,41147,69039,41529,52796,39999
116,2006,47539,42641,70128,43222,53782,41051
117,2007,47759,44710,71485,44378,54273,41987
118,2008,47236,44851,70936,43835,53688,41923
119,2009,44679,41033,68837,41577,51870,39924


In [18]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["DNK"][110], df["DNK"][129]))
    

16.622709446597717


In [19]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["FIN"][110], df["FIN"][129]))

22.200698461202673


In [20]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["NOR"][110], df["NOR"][129]))

13.35299469770652


In [21]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["SWE"][110], df["SWE"][129]))

28.998895741872932


In [22]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["USA"][110], df["USA"][129]))

24.985109573004173


In [23]:
def noComma(num):
    ans = ""
    for i in num:
        if i != ",":
            ans+= i
    return int(ans)

def percentChange(val1, val2):
    val2 = noComma(val2)
    val1 = noComma(val1)
    return (int(val2)/int(val1) - 1)*100

print(percentChange(df["Euro Area"][110], df["Euro Area"][129]))

16.61246328157784


In [24]:
#create dataframe
df_marks = pd.DataFrame(GDPscand)
print('Original DataFrame\n------------------')
print(df_marks)

new_row = {'Year':'% Change', 'DNK':16.62, 'FIN':22.20, 'NOR':13.35, 'SWE':28.99, 'USA':24.98, 'Euro Area':16.61}
#append row to the dataframe
df_marks = df_marks.append(new_row, ignore_index=True)

print('\n\nNew row added to DataFrame\n--------------------------')
print(df_marks)

Original DataFrame
------------------
     Year     DNK     FIN     NOR     SWE     USA Euro Area
110  2000  43,603  36,652  63,746  37,129  48,689    38,128
111  2001  43,805  37,523  64,740  37,566  48,691    38,803
112  2002  43,869  38,072  65,323  38,267  49,081    38,944
113  2003  43,920  38,742  65,532  39,005  50,054    38,961
114  2004  44,976  40,172  67,732  40,537  51,476    39,594
115  2005  45,900  41,147  69,039  41,529  52,796    39,999
116  2006  47,539  42,641  70,128  43,222  53,782    41,051
117  2007  47,759  44,710  71,485  44,378  54,273    41,987
118  2008  47,236  44,851  70,936  43,835  53,688    41,923
119  2009  44,679  41,033  68,837  41,577  51,870    39,924
120  2010  45,313  42,147  68,462  43,678  52,761    40,644
121  2011  45,730  43,020  68,243  44,735  53,195    41,370
122  2012  45,662  42,218  69,173  44,144  53,997    40,852
123  2013  45,896  41,645  69,048  44,291  54,616    40,573
124  2014  46,404  41,322  69,619  45,019  55,589    40,978
12

  df_marks = df_marks.append(new_row, ignore_index=True)
