# Series

A 1-dimensional array of data. Each value in a series has to be of the same type, known as a "dtype." The values are labeled using the "index," which starts at 0 (like Python data structures) by default, but can be different if we want.

In [1]:
import pandas as pd
from pandas import Series, DataFrame

In [2]:
s = Series([10, 20, 30, 40, 50])

In [3]:
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [4]:
# I can set a different dtype -- either implicitly, or explicitly

s = Series([10, 20, 30, 40, 50.5])
s

0    10.0
1    20.0
2    30.0
3    40.0
4    50.5
dtype: float64

In [5]:
# I can set the dtype explicitly with dtype= and then a dtype name
s = Series([10, 20, 30, 40, 50], dtype='int8')
s

0    10
1    20
2    30
3    40
4    50
dtype: int8

In [6]:
# I can also set the index to something else -- basically, any values I want!

s = Series([10, 20, 30, 40, 50],
           index=list('abcde'))
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [7]:
# the index can contain strings that are longer than 1 character!

s = Series([10, 20, 30, 40, 50],
           index='abcd ef ghi jkl mn'.split())
s

abcd    10
ef      20
ghi     30
jkl     40
mn      50
dtype: int64

In [8]:
s.loc['ghi']

np.int64(30)

In [9]:
s.iloc[3]  # retrieves from the numeric position 3

np.int64(40)

In [10]:
# we can even assign using s.loc and s.iloc!

s.loc['ghi'] = 99
s

abcd    10
ef      20
ghi     99
jkl     40
mn      50
dtype: int64

In [11]:
s.iloc[0] = 888
s

abcd    888
ef       20
ghi      99
jkl      40
mn       50
dtype: int64

# Data frame

Data frames are 2D tables in Pandas. The data frame has an index, just like the series index, which is 0-(length-1) by default, but can be set to anything you want. The column names are integers (starting at 0) by default, but you can set those, too. The values can be any values you would have in a series.

You should think of a data frame as a bunch of series. Each column in the data frame is a distinct series, and thus has a different dtype.

In [12]:
df = DataFrame([[10, 20, 30],
                [40, 50, 60],
                [70, 80, 90],
                [100, 110, 120]])
df

Unnamed: 0,0,1,2
0,10,20,30
1,40,50,60
2,70,80,90
3,100,110,120


In [13]:
df.loc[1]

0    40
1    50
2    60
Name: 1, dtype: int64

In [14]:
df.loc[0, 0] = 10.5
df

  df.loc[0, 0] = 10.5


Unnamed: 0,0,1,2
0,10.5,20,30
1,40.0,50,60
2,70.0,80,90
3,100.0,110,120


In [15]:
df.dtypes

0    float64
1      int64
2      int64
dtype: object

In [16]:
s.values

array([888,  20,  99,  40,  50])

In [17]:
df.values

array([[ 10.5,  20. ,  30. ],
       [ 40. ,  50. ,  60. ],
       [ 70. ,  80. ,  90. ],
       [100. , 110. , 120. ]])