# Why Python (or _Why, Python_)?

In [1]:
sales = [1300, 1600, 7000, 4000, 1900, 800]
sales[0]

1300

In [2]:
sales[0:4]

[1300, 1600, 7000, 4000]

What about `sales[-2]`?

In [5]:
sales[-5:-2:2]

[1600, 4000]

What about `sales[-2:-3]`?

What about `sales[-5:-2:2]`?

How do we subset 1st, 3rd, 4th and last element of the list and store them into another list?

## But we have Pandas

In [6]:
import pandas as pd

so = pd.read_csv('data_input/stackoverflow_r.csv', index_col=0)
so.head()

Unnamed: 0_level_0,User Link,Reputation,Tag Score
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,7515530,1323,59
2,2707355,41,4
3,1564659,5803,4
4,8566724,36,2
5,1760984,4248,1


`[]` is commonly used as a column selector, like the following examples:

In [7]:
so['User Link']

#
1      7515530
2      2707355
3      1564659
4      8566724
5      1760984
6      8363928
7      4518857
8      9636225
9     10105253
10     2796506
11     4124208
Name: User Link, dtype: int64

In [8]:
so[['User Link', 'Reputation']]

Unnamed: 0_level_0,User Link,Reputation
#,Unnamed: 1_level_1,Unnamed: 2_level_1
1,7515530,1323
2,2707355,41
3,1564659,5803
4,8566724,36
5,1760984,4248
6,8363928,11
7,4518857,49
8,9636225,1
9,10105253,1
10,2796506,67


In many real-life data science work, you don't have properly labelled dataframes. You have raw data with your column names recorded or stored separately in a different file. You're working with data frame like this:

In [9]:
so.rename(columns={'User Link': '0', 'Reputation': '1', 'Tag Score': '2'}, inplace=True)
so.head()

Unnamed: 0_level_0,0,1,2
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,7515530,1323,59
2,2707355,41,4
3,1564659,5803,4
4,8566724,36,2
5,1760984,4248,1


But then.. `so[0:1]` ?

In [14]:
so[0:1]

Unnamed: 0_level_0,0,1,2
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,7515530,1323,59


### Reason: Operator Overloading
`[]` is overloaded. This means, that depending on the inputs, pandas will do something completely different. Here are the rules for the different objects you pass to just the indexing operator.

- string: return a column as a Series
- list of strings: return all those columns as a DataFrame
- a slice: select rows (can do both label and integer location — confusing!)
- a sequence of booleans: select all rows where True

In [9]:
so = pd.read_csv('data_input/stackoverflow_r.csv', index_col=0)
so.head()

Unnamed: 0_level_0,User Link,Reputation,Tag Score
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,7515530,1323,59
2,2707355,41,4
3,1564659,5803,4
4,8566724,36,2
5,1760984,4248,1


In [16]:
so2 = so.head().copy()

In [17]:
so2.rename(index={1:'Johnny', 2:'Megy', 3:'Lee', 4:'Sam', 5:'Andrew'}, inplace=True)
so2

Unnamed: 0_level_0,User Link,Reputation,Tag Score
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Johnny,7515530,1323,59
Megy,2707355,41,4
Lee,1564659,5803,4
Sam,8566724,36,2
Andrew,1760984,4248,1


In [19]:
so2.loc[['Megy', 'Lee']]['Reputation'] = 30
so2

Unnamed: 0_level_0,User Link,Reputation,Tag Score
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Johnny,7515530,1323,59
Megy,2707355,41,4
Lee,1564659,5803,4
Sam,8566724,36,2
Andrew,1760984,4248,1


In [20]:
so2.loc['Megy':'Lee']['Reputation'] = 30
so2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,User Link,Reputation,Tag Score
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Johnny,7515530,1323,59
Megy,2707355,30,4
Lee,1564659,30,4
Sam,8566724,36,2
Andrew,1760984,4248,1


**Urghhhh?**