<p>Pandas is a Python library that provides data structures and functions
for working with structured data, primarily tabular data.
Pandas is built on top of NumPy and some familiarity with NumPy
makes Pandas easier to use and understand.
Pandas has two data structures that you need to know the basics of,
and these are called Series and Data Frame.
In short, Series is a one-dimensional array-like object,
and Data Frame is a two-dimensional array-like object.
Both objects also contain additional information about the data
called metadata</p>

In [6]:
import pandas as pd

x = pd.Series([9,7,5,3,9])
x

0    9
1    7
2    5
3    3
4    9
dtype: int64

In [3]:
x = pd.Series([9,7,5,3,9], index=["q","w","e","r","t"])
x, x[["q", "t"]]

(q    9
 w    7
 e    5
 r    3
 t    9
 dtype: int64, q    9
 t    9
 dtype: int64)

In [15]:
age = {"A": 10, "B": 11, "C": 13}
pd.Series(age)

A    10
B    11
C    13
dtype: int64

In [4]:
data = {
    "name": ["A", "B", "C"],
    "age": [11, 12, 13],
    "virgin": [True, True, False] 
}
pd.Series(data)

name                [A, B, C]
age              [11, 12, 13]
virgin    [True, True, False]
dtype: object

In [5]:
pd.DataFrame(data, columns=["age", "name", "virgin"])

Unnamed: 0,age,name,virgin
0,11,A,True
1,12,B,True
2,13,C,False


In [6]:
x.index

Index(['q', 'w', 'e', 'r', 't'], dtype='object')

In [27]:
sorted(x.index)

['e', 'q', 'r', 't', 'w']

In [7]:
x.reindex(sorted(x.index))

e    5
q    9
r    3
t    9
w    7
dtype: int64

In [8]:
y = pd.Series([5,3,1], index=["w","e","r"])
x + y

e     8.0
q     NaN
r     4.0
t     NaN
w    12.0
dtype: float64

<hr>
<hr>
<hr>

In [10]:
import pandas as pd
import numpy as np

bacteria = pd.Series([632, 1638, 569, 115],
                    index=['Firmicutes', 'Proteobacteria', 'Actinobacteria', 'Bacteroidetes'])
bacteria

Firmicutes         632
Proteobacteria    1638
Actinobacteria     569
Bacteroidetes      115
dtype: int64

In [12]:
bacteria[[name.endswith('bacteria') for name in bacteria.index]]

Proteobacteria    1638
Actinobacteria     569
dtype: int64

In [15]:
[name.endswith('bacteria') for name in bacteria.index]

[False, True, True, False]

In [17]:
bacteria.name = 'counts'
bacteria.index.name = 'phylum'
bacteria

phylum
Firmicutes         632
Proteobacteria    1638
Actinobacteria     569
Bacteroidetes      115
Name: counts, dtype: int64

In [19]:
np.log(bacteria)

phylum
Firmicutes        6.448889
Proteobacteria    7.401231
Actinobacteria    6.343880
Bacteroidetes     4.744932
Name: counts, dtype: float64

In [23]:
bacteria[bacteria > 1000]

phylum
Proteobacteria    1638
Name: counts, dtype: int64

In [25]:
bacteria_dict = {'Firmicutes': 632, 'Proteobacteria': 1638, 'Actinobacteria': 569, 'Bacteroidetes': 115}
pd.Series(bacteria_dict)

Firmicutes         632
Proteobacteria    1638
Actinobacteria     569
Bacteroidetes      115
dtype: int64

In [27]:
bacteria2 = pd.Series(bacteria_dict,
                      index=['Cyanobacteria','Firmicutes','Proteobacteria','Actinobacteria'])
bacteria2

Cyanobacteria        NaN
Firmicutes         632.0
Proteobacteria    1638.0
Actinobacteria     569.0
dtype: float64

In [29]:
bacteria2.isnull()

Cyanobacteria      True
Firmicutes        False
Proteobacteria    False
Actinobacteria    False
dtype: bool

In [32]:
bacteria + bacteria2

Actinobacteria    1138.0
Bacteroidetes        NaN
Cyanobacteria        NaN
Firmicutes        1264.0
Proteobacteria    3276.0
dtype: float64

In [44]:
data = pd.DataFrame({'value': [632, 1638, 569, 115, 433, 1130, 754, 555],
                     'patient': [1, 1, 1, 1, 2, 2, 2, 2],
                     'phyume': ['Firmicutes', 'Proteobacteria', 'Actinobacteria', 
                                'Bacteroidetes', 'Firmicutes', 'Proteobacteria',
                                'Actinobacteria', 'Bacteroidetes']
                    })
data

Unnamed: 0,value,patient,phyume
0,632,1,Firmicutes
1,1638,1,Proteobacteria
2,569,1,Actinobacteria
3,115,1,Bacteroidetes
4,433,2,Firmicutes
5,1130,2,Proteobacteria
6,754,2,Actinobacteria
7,555,2,Bacteroidetes


In [63]:
data = data[['phyume', 'value', 'patient']]
data

Unnamed: 0,phyume,value,patient
0,Firmicutes,632,1
1,Proteobacteria,1638,1
2,Actinobacteria,569,1
3,Bacteroidetes,115,1
4,Firmicutes,433,2
5,Proteobacteria,1130,2
6,Actinobacteria,754,2
7,Bacteroidetes,555,2


In [64]:
data.columns

Index(['phyume', 'value', 'patient'], dtype='object')

In [65]:
data['value']

0     632
1    1638
2     569
3     115
4     433
5    1130
6     754
7     555
Name: value, dtype: int64

In [66]:
data.value

0     632
1    1638
2     569
3     115
4     433
5    1130
6     754
7     555
Name: value, dtype: int64

In [67]:
type(data.value)

pandas.core.series.Series

In [68]:
type(data[['value']])

pandas.core.frame.DataFrame

In [69]:
data.loc[3]

phyume     Bacteroidetes
value                115
patient                1
Name: 3, dtype: object

In [71]:
data.T

Unnamed: 0,0,1,2,3,4,5,6,7
phyume,Firmicutes,Proteobacteria,Actinobacteria,Bacteroidetes,Firmicutes,Proteobacteria,Actinobacteria,Bacteroidetes
value,632,1638,569,115,433,1130,754,555
patient,1,1,1,1,2,2,2,2


In [74]:
vals = data.value
vals[5] = 0
vals

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


0     632
1    1638
2     569
3     115
4     433
5       0
6     754
7     555
Name: value, dtype: int64

In [76]:
data

Unnamed: 0,phyume,value,patient
0,Firmicutes,632,1
1,Proteobacteria,1638,1
2,Actinobacteria,569,1
3,Bacteroidetes,115,1
4,Firmicutes,433,2
5,Proteobacteria,0,2
6,Actinobacteria,754,2
7,Bacteroidetes,555,2


In [80]:
vals = data.value.copy()
vals[5] = 1000
data

Unnamed: 0,phyume,value,patient
0,Firmicutes,632,1
1,Proteobacteria,1638,1
2,Actinobacteria,569,1
3,Bacteroidetes,115,1
4,Firmicutes,433,2
5,Proteobacteria,0,2
6,Actinobacteria,754,2
7,Bacteroidetes,555,2


In [81]:
data.value[3] = 14
data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,phyume,value,patient
0,Firmicutes,632,1
1,Proteobacteria,1638,1
2,Actinobacteria,569,1
3,Bacteroidetes,14,1
4,Firmicutes,433,2
5,Proteobacteria,0,2
6,Actinobacteria,754,2
7,Bacteroidetes,555,2


In [85]:
data['year'] = 2018
data

Unnamed: 0,phyume,value,patient,year
0,Firmicutes,632,1,2018
1,Proteobacteria,1638,1,2018
2,Actinobacteria,569,1,2018
3,Bacteroidetes,14,1,2018
4,Firmicutes,433,2,2018
5,Proteobacteria,0,2,2018
6,Actinobacteria,754,2,2018
7,Bacteroidetes,555,2,2018


In [89]:
treatment = pd.Series([0]*4 + [1]*2)
treatment

0    0
1    0
2    0
3    0
4    1
5    1
dtype: int64

In [90]:
month = ['Jan', 'Feb', 'Mar', 'May']
data['month'] = month

ValueError: Length of values does not match length of index

In [100]:
data['month'] = ['Jan'] * len(data)
data

Unnamed: 0,phyume,value,patient,year,month
0,Firmicutes,632,1,2018,Jan
1,Proteobacteria,1638,1,2018,Jan
2,Actinobacteria,569,1,2018,Jan
3,Bacteroidetes,14,1,2018,Jan
4,Firmicutes,433,2,2018,Jan
5,Proteobacteria,0,2,2018,Jan
6,Actinobacteria,754,2,2018,Jan
7,Bacteroidetes,555,2,2018,Jan


In [101]:
del data['month']

In [102]:
data

Unnamed: 0,phyume,value,patient,year
0,Firmicutes,632,1,2018
1,Proteobacteria,1638,1,2018
2,Actinobacteria,569,1,2018
3,Bacteroidetes,14,1,2018
4,Firmicutes,433,2,2018
5,Proteobacteria,0,2,2018
6,Actinobacteria,754,2,2018
7,Bacteroidetes,555,2,2018


In [8]:
df = pd.DataFrame({
    'foo': [1,2,3],
    'bar': [.4, -1.0, 4.5]
    })
df

Unnamed: 0,foo,bar
0,1,0.4
1,2,-1.0
2,3,4.5


In [9]:
df.index

RangeIndex(start=0, stop=3, step=1)