## Tables 

Tables are a fundamental object type for representing data sets. A table can be viewed in two ways:

a sequence of named columns that each describe a single aspect of all entries in a data set, or
a sequence of rows that each contain all information about a single entry in a data set.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame() 

In [3]:
data=[8,34,5]

Below, we begin each example with an empty table that has no columns.

In [4]:
df = pd.DataFrame(data, columns = ['Number of Petals']) 

In [5]:
df

Unnamed: 0,Number of Petals
0,8
1,34
2,5


In [6]:
df1 = pd.DataFrame() 

To add two (or more) new columns, provide the label and array for each column. All columns must have the same length, or an error will occur.

In [7]:
df1['Number of Petals'] = [8,34,5]
df1['Name'] = ['lotus', 'sunflower', 'rose'] 

In [8]:
df1

Unnamed: 0,Number of Petals,Name
0,8,lotus
1,34,sunflower
2,5,rose


We can give this table a name, and then extend the table with another column.

In [9]:
df11 = pd.DataFrame() 
df11['Number of Petals'] = [8,34,5]
df11['Name'] = ['lotus', 'sunflower', 'rose'] 
df11['Color'] = ['pink', 'yellow', 'red']
df11

Unnamed: 0,Number of Petals,Name,Color
0,8,lotus,pink
1,34,sunflower,yellow
2,5,rose,red


In [10]:
df1

Unnamed: 0,Number of Petals,Name
0,8,lotus
1,34,sunflower
2,5,rose


Creating tables in this way involves a lot of typing. If the data have already been entered somewhere, it is usually possible to use Python to read it into a table, instead of typing it all in cell by cell.

Often, tables are created from files that contain comma-separated values. Such files are called CSV files.

Below, we use the Table method read_table to read a CSV file that contains some of the data used by Minard in his graphic about Napoleon's Russian campaign. The data are placed in a table named minard.

In [11]:
df =pd.read_csv('minard.csv') 
df

Unnamed: 0,Longitude,Latitude,City,Direction,Survivors
0,32.0,54.8,Smolensk,Advance,145000
1,33.2,54.9,Dorogobouge,Advance,140000
2,34.4,55.5,Chjat,Advance,127100
3,37.6,55.8,Moscou,Advance,100000
4,34.3,55.2,Wixma,Retreat,55000
5,32.0,54.6,Smolensk,Retreat,24000
6,30.4,54.4,Orscha,Retreat,20000
7,26.8,54.3,Moiodexno,Retreat,12000


The Size of the Table

In [12]:
df.shape

(8, 5)

Column Labels

In [13]:
df.columns

Index(['Longitude', 'Latitude', 'City', 'Direction', 'Survivors'], dtype='object')

We can change column labels using the relabeled method. This creates a new table and leaves minard unchanged.

In [14]:
list(df.columns)

['Longitude', 'Latitude', 'City', 'Direction', 'Survivors']

In [15]:
rows,columns=df.shape

In [16]:
rows

8

In [17]:
columns

5

In [18]:
df_new = df.rename(columns={'City': 'Cityname'})
df_new

Unnamed: 0,Longitude,Latitude,Cityname,Direction,Survivors
0,32.0,54.8,Smolensk,Advance,145000
1,33.2,54.9,Dorogobouge,Advance,140000
2,34.4,55.5,Chjat,Advance,127100
3,37.6,55.8,Moscou,Advance,100000
4,34.3,55.2,Wixma,Retreat,55000
5,32.0,54.6,Smolensk,Retreat,24000
6,30.4,54.4,Orscha,Retreat,20000
7,26.8,54.3,Moiodexno,Retreat,12000


In [19]:
df_new = df.rename(columns={'City': 'Cityname'},inplace=True)
df_new

We can change column labels using the relabeled method. This creates a new table and leaves minard unchanged.00000000000000000000000000000000000000000

## Accessing the Data in a Column

We can use a column's label to access the array of data in the column.

In [20]:
df.iat[2,2]  

'Chjat'

In [21]:
df.iat[7,4]  

12000

In [22]:
#Accesing elements using row and column numbers
df.iat[2,2] 

'Chjat'

In [23]:
df ['Latitude']

0    54.8
1    54.9
2    55.5
3    55.8
4    55.2
5    54.6
6    54.4
7    54.3
Name: Latitude, dtype: float64

## Working with the Data in a Column

Because columns are arrays, we can use array operations on them to discover new information. For example, we can create a new column that contains the percent of all survivors at each city after Smolensk.

In [24]:
initial = df.Survivors
initial 

0    145000
1    140000
2    127100
3    100000
4     55000
5     24000
6     20000
7     12000
Name: Survivors, dtype: int64

In [25]:
 df.iloc[3]['Cityname']


'Moscou'

Because columns are arrays, we can use array operations on them to discover new information. For example, we can create a new column that contains the percent of all survivors at each city after Smolensk.

In [26]:
mean=df

In [27]:
p=df.iloc[0]['Survivors']
percentage=(initial/p)*100

In [28]:
df['Percentage'] = percentage
df

Unnamed: 0,Longitude,Latitude,Cityname,Direction,Survivors,Percentage
0,32.0,54.8,Smolensk,Advance,145000,100.0
1,33.2,54.9,Dorogobouge,Advance,140000,96.551724
2,34.4,55.5,Chjat,Advance,127100,87.655172
3,37.6,55.8,Moscou,Advance,100000,68.965517
4,34.3,55.2,Wixma,Retreat,55000,37.931034
5,32.0,54.6,Smolensk,Retreat,24000,16.551724
6,30.4,54.4,Orscha,Retreat,20000,13.793103
7,26.8,54.3,Moiodexno,Retreat,12000,8.275862


## Choosing Sets of Columns

In [29]:
df[['Cityname','Survivors']]

Unnamed: 0,Cityname,Survivors
0,Smolensk,145000
1,Dorogobouge,140000
2,Chjat,127100
3,Moscou,100000
4,Wixma,55000
5,Smolensk,24000
6,Orscha,20000
7,Moiodexno,12000


The same selection can be made using column indices instead of labels.0000000000000000000000000000

In [30]:
p=df.iloc[2][3]
p

'Advance'

In [31]:
df.Cityname

0       Smolensk
1    Dorogobouge
2          Chjat
3         Moscou
4          Wixma
5       Smolensk
6         Orscha
7      Moiodexno
Name: Cityname, dtype: object

In [32]:
df[['Cityname']]

Unnamed: 0,Cityname
0,Smolensk
1,Dorogobouge
2,Chjat
3,Moscou
4,Wixma
5,Smolensk
6,Orscha
7,Moiodexno


In [33]:
df1=df.copy()
df1.drop(['Cityname'],axis=1)
df1


Unnamed: 0,Longitude,Latitude,Cityname,Direction,Survivors,Percentage
0,32.0,54.8,Smolensk,Advance,145000,100.0
1,33.2,54.9,Dorogobouge,Advance,140000,96.551724
2,34.4,55.5,Chjat,Advance,127100,87.655172
3,37.6,55.8,Moscou,Advance,100000,68.965517
4,34.3,55.2,Wixma,Retreat,55000,37.931034
5,32.0,54.6,Smolensk,Retreat,24000,16.551724
6,30.4,54.4,Orscha,Retreat,20000,13.793103
7,26.8,54.3,Moiodexno,Retreat,12000,8.275862
