# Basic pandas structure and data selection

## 1. About the notebook

This is a `jupyter` notebook, here you can combine text with formatting, python code and its output like tables or figures in one document.

Notebooks consist of _cells_ which can be either `code` or `text` (with markdown formatting)

This is `text` _cell_, above are more `text` _cells_, below is a `code` _cell_

In [11]:
import pandas as pd

You can recognize it by monospace font and the square brackets in front: `[1]`

To run a `code` _cell_, press the ▶ button at the top, or hit Shift+Enter, or Ctrl+Enter, or Alt+Enter

If a `code` _cell_ has an output, it will be displayed below the _cell_, like here:

In [12]:
print("This is output")

This is output


This _cell_ will have output too:

In [15]:
1 + 9

10

Two very handy parts of `jupyter` notebooks are:
* code completion
* quick documentation

For code completion, start writing code and hit Tab, like here:

In [None]:
pd.

For quick documentation, place your cursor in the code section on a function, and hit Shift-Tab, like here:

In [None]:
pd.read_csv

## 2. Getting data

We'll be using free data via gapminder.org [repository](https://github.com/open-numbers/ddf--gapminder--systema_globalis), CC-BY LICENSE for this exercise. 

Let's load the data its storage on github: two tables, with information about countries and their gini index by year:

In [13]:
countries = pd.read_csv(
    "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis"
    "/master/ddf--entities--geo--country.csv"
)

In [14]:
gini = pd.read_csv(
    "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis"
    "/master/countries-etc-datapoints/ddf--datapoints--gapminder_gini--by--geo--time.csv"
)

`jupyter` notebooks allow us to look at the `pandas` table in a nice way, just by having the output of a _cell_ be a `pandas` table:

In [16]:
countries

Unnamed: 0,country,g77_and_oecd_countries,income_3groups,income_groups,is--country,iso3166_1_alpha2,iso3166_1_alpha3,iso3166_1_numeric,iso3166_2,landlocked,...,name,un_sdg_ldc,un_sdg_region,un_state,unhcr_region,unicef_region,unicode_region_subtag,west_and_rest,world_4region,world_6region
0,abkh,others,,,True,,,,,,...,Abkhazia,,,False,,,,,europe,europe_central_asia
1,abw,others,high_income,high_income,True,AW,ABW,533.0,,coastline,...,Aruba,un_not_least_developed,un_latin_america_and_the_caribbean,False,unhcr_americas,,AW,,americas,america
2,afg,g77,low_income,low_income,True,AF,AFG,4.0,,landlocked,...,Afghanistan,un_least_developed,un_central_and_southern_asia,True,unhcr_asia_pacific,sa,AF,rest,asia,south_asia
3,ago,g77,middle_income,lower_middle_income,True,AO,AGO,24.0,,coastline,...,Angola,un_least_developed,un_sub_saharan_africa,True,unhcr_southern_africa,ssa,AO,rest,africa,sub_saharan_africa
4,aia,others,,,True,AI,AIA,660.0,,coastline,...,Anguilla,un_not_least_developed,un_latin_america_and_the_caribbean,False,unhcr_americas,,AI,,americas,america
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268,yem_south,others,,,True,,,,,coastline,...,South Yemen (former),,,False,,,,,asia,middle_east_north_africa
269,yug,others,,,True,,,,,coastline,...,Yugoslavia,,,False,,,,,europe,europe_central_asia
270,zaf,g77,middle_income,upper_middle_income,True,ZA,ZAF,710.0,,coastline,...,South Africa,un_not_least_developed,un_sub_saharan_africa,True,unhcr_southern_africa,ssa,ZA,rest,africa,sub_saharan_africa
271,zmb,g77,middle_income,lower_middle_income,True,ZM,ZMB,894.0,,landlocked,...,Zambia,un_least_developed,un_sub_saharan_africa,True,unhcr_southern_africa,ssa,ZM,rest,africa,sub_saharan_africa


Or `pandas` series:

In [19]:
countries.country

0           abkh
1            abw
2            afg
3            ago
4            aia
         ...    
268    yem_south
269          yug
270          zaf
271          zmb
272          zwe
Name: country, Length: 273, dtype: object

Compare this nice table with a regular view for the same list of countries, for example:

In [18]:
countries.country[:20].to_dict()

{0: 'abkh',
 1: 'abw',
 2: 'afg',
 3: 'ago',
 4: 'aia',
 5: 'akr_a_dhe',
 6: 'ala',
 7: 'alb',
 8: 'and',
 9: 'ant',
 10: 'are',
 11: 'arg',
 12: 'arm',
 13: 'asm',
 14: 'ata',
 15: 'atg',
 16: 'aus',
 17: 'aut',
 18: 'aze',
 19: 'bdi'}

## 3. Pandas data structure

What we got above with `pd.read_csv` call is a `pandas` table. Other `read_*` functions will also return a table. We can construct a table from data by calling `pd.DataFrame` function.

`pandas` tables are column-oriented: columns usually mean variables and rows—observations.

Some operations throughout `pandas` can be done either by rows or by columns. Such operations will ask for parameter `axis`, and for it `0`=`rows` and `1`=`columns`

Mnemonic is that `1` is vertical, so it is `columns`

Rows and columns have names (or numbers) to address them specifically. For rows this is called `index`, for columns it is just referred to as “columns”

Let's examine our `countries` table

It's dimensions:

In [20]:
countries.shape

(273, 23)

Number of rows (remember, `0`=`rows`)

In [21]:
countries.shape[0]

273

Number of columns

In [22]:
countries.shape[1]

23

The `index`:

In [23]:
countries.index

RangeIndex(start=0, stop=273, step=1)

The `columns`:

In [24]:
countries.columns

Index(['country', 'g77_and_oecd_countries', 'income_3groups', 'income_groups',
       'is--country', 'iso3166_1_alpha2', 'iso3166_1_alpha3',
       'iso3166_1_numeric', 'iso3166_2', 'landlocked', 'latitude', 'longitude',
       'main_religion_2008', 'name', 'un_sdg_ldc', 'un_sdg_region', 'un_state',
       'unhcr_region', 'unicef_region', 'unicode_region_subtag',
       'west_and_rest', 'world_4region', 'world_6region'],
      dtype='object')

## 3. Pandas Series: one table column

Let's take one column from the table:

In [26]:
countries["country"]

0           abkh
1            abw
2            afg
3            ago
4            aia
         ...    
268    yem_south
269          yug
270          zaf
271          zmb
272          zwe
Name: country, Length: 273, dtype: object