# 01.B: Organizing data into  Series and DataFrames with Pandas

Pandas is another essential package for data analysis and machine learning. It comes with two main data structures: Series and DataFrame. A Series is like a dictionary with keys (also called indexes) and values. A DataFrame represents tabular data with one or more columns.

To use pandas, we first need to import it. Let's also import NumPy.

In [1]:
import numpy as np
import pandas as pd

We can now create series and dataframes.

## Series
A series is a one-dimensional narray with axis labels (or indexes).

### Creating a series
We can create a series by providing an array of values.

In [2]:
s1 = pd.Series([10,20,30,40,50])
print(s1)

0    10
1    20
2    30
3    40
4    50
dtype: int64


Since we did not provide indexes for our data, numeric zero-based indexes (much like those for arrays) will be automatically provided.

We can also create a series by providing custom indexes.

In [3]:
s2 = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(s2)


a    1
b    2
c    3
d    4
e    5
dtype: int64


Notice that indexes are not required to be unique. For example, we can create a Series with duplicate indexes.

In [4]:
s3 = pd.Series([1,2,3,4,5], index=['a','b','b','c','c']) 
print(s3)

a    1
b    2
b    3
c    4
c    5
dtype: int64


We can create a series from NumPy arrays.

In [5]:
snp = pd.Series(np.arange(2, 11, 2), index=['A','B','C','D','F']) 
print(snp)

A     2
B     4
C     6
D     8
F    10
dtype: int64


And indexes could be ranges.

In [6]:
s4 = pd.Series(np.arange(2, 11, 2), index=np.arange(1, 6)) 
print(s4)

1     2
2     4
3     6
4     8
5    10
dtype: int64


### Accessing elements in a series

We can use the indexes to extract and/or slice elements within a series. For example, given the series `s1`:

In [7]:
s1

0    10
1    20
2    30
3    40
4    50
dtype: int64

Here is the element at index 0:

In [8]:
s1[0]

10

and the elements at indexes 2 and 3:

In [9]:
s1[[2,3]]

2    30
3    40
dtype: int64

or at the index range from 2 up to but not equal to 4

In [10]:
s1[2:4]

2    30
3    40
dtype: int64

And given the series `s2`:

In [11]:
s2

a    1
b    2
c    3
d    4
e    5
dtype: int64

Here is the element at index 'c':

In [12]:
s2['c']

3

and here are the elements from index 'b' to index 'd'

In [13]:
s2['b':'d']

b    2
c    3
d    4
dtype: int64

And for a series with duplicate indexes such as:

In [14]:
s3

a    1
b    2
b    3
c    4
c    5
dtype: int64

Using a duplicate index returns a series of all the elements with that index.

In [15]:
s3['b']

b    2
b    3
dtype: int64

The above use of indexes is the same as using `.loc[]` with indexes between `[` and `]`. That is

In [16]:
print(snp['A':'D'])
print(s3['b'])

A    2
B    4
C    6
D    8
dtype: int64
b    2
b    3
dtype: int64


is the same as:

In [17]:
print(snp.loc['A':'D'])
print(s3.loc['b'])

A    2
B    4
C    6
D    8
dtype: int64
b    2
b    3
dtype: int64


But sometimes we want to use the position of the index rather than its actual value to access elements within a series. We can use `.iloc[]` with zero-based numeric indexes positions between `[` and `]`. The position 0 means the first index. For example, given the Series:

In [18]:
snp

A     2
B     4
C     6
D     8
F    10
dtype: int64

We can access its first element:

In [19]:
snp.iloc[0]

2

And its last element:

In [20]:
snp.iloc[snp.size - 1]

10

With `.iloc`, we can index and slice a series in exactly the same way we did in a one-dimensional NumPy array. Here is, for example, how you select every other element in a Series.

In [21]:
snp.iloc[0::2]

A     2
C     6
F    10
dtype: int64

And speaking of NumPy, we can extract the values of a series as a NumPy array.

In [22]:
snp.values

array([ 2,  4,  6,  8, 10])

We can also extract its indexes as an array also:

In [23]:
snp.index

Index(['A', 'B', 'C', 'D', 'F'], dtype='object')

## DataFrames
A data frame is a two-dimensional table with columns and rows. You can think of each column on a DataFrame as a series sharing the same indexes with the other columns. Each column has a name that can be used to access it.

### Creating DataFrames
The easiest way to create a DataFrame is by using a dictionary of arrays. The keys of this dictionary will become column names. Here is a DataFrame with 4 by 9 multiplication table.

In [24]:
mtable = pd.DataFrame({
    'byOne': [1,2, 3, 4, 5, 6, 7, 8, 9],
    'byTwo': [2, 4, 6, 8, 10, 12, 14, 16, 18],
    'byThree': [3, 6, 9, 12, 15, 18, 21, 24, 27],
    'byFour': [4, 8, 12, 16, 20, 24, 28, 32, 36]
})

mtable

Unnamed: 0,byOne,byTwo,byThree,byFour
0,1,2,3,4
1,2,4,6,8
2,3,6,9,12
3,4,8,12,16
4,5,10,15,20
5,6,12,18,24
6,7,14,21,28
7,8,16,24,32
8,9,18,27,36


Since we did not provide indexes, Pandas will create zero-based numeric indexes for us, just like it does for a Series.

We can use NumPy arrays to create the same table:

In [25]:
mtable = pd.DataFrame({
    'byOne': np.arange(1, 10),
    'byTwo': 2 * np.arange(1, 10),
    'byThree': 3 * np.arange(1, 10),
    'byFour': 4 * np.arange(1, 10)
})

print(mtable)

   byOne  byTwo  byThree  byFour
0      1      2        3       4
1      2      4        6       8
2      3      6        9      12
3      4      8       12      16
4      5     10       15      20
5      6     12       18      24
6      7     14       21      28
7      8     16       24      32
8      9     18       27      36


We can create a DataFrame from a two-dimensional array. Given an array,

In [26]:
table = np.array([
    np.arange(1, 10),
    2 * np.arange(1, 10),
    3 * np.arange(1, 10),
    4 * np.arange(1, 10)
]).T

print(table)

[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]
 [ 4  8 12 16]
 [ 5 10 15 20]
 [ 6 12 18 24]
 [ 7 14 21 28]
 [ 8 16 24 32]
 [ 9 18 27 36]]


we can create a DataFrame from it.

In [27]:
mtable = pd.DataFrame(table)

mtable

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,2,4,6,8
2,3,6,9,12
3,4,8,12,16
4,5,10,15,20
5,6,12,18,24
6,7,14,21,28
7,8,16,24,32
8,9,18,27,36


Since we did not provide column names or indexes, Pandas provided numeric zero-based column names and indexes for us. We can rename these columns by providing custom names for it after it was created.

In [28]:
mtable.columns= ['A', 'B', 'C', 'D']
mtable

Unnamed: 0,A,B,C,D
0,1,2,3,4
1,2,4,6,8
2,3,6,9,12
3,4,8,12,16
4,5,10,15,20
5,6,12,18,24
6,7,14,21,28
7,8,16,24,32
8,9,18,27,36


We can provide the column names at the time of creating the DataFrame.

In [29]:
mt = pd.DataFrame(table, columns=['byOne', 'byTwo', 'byThree', 'byFour'])

mt

Unnamed: 0,byOne,byTwo,byThree,byFour
0,1,2,3,4
1,2,4,6,8
2,3,6,9,12
3,4,8,12,16
4,5,10,15,20
5,6,12,18,24
6,7,14,21,28
7,8,16,24,32
8,9,18,27,36


We can also provide custom indexes.

In [30]:
mt = pd.DataFrame(table, 
                  columns=['byOne', 'byTwo', 'byThree', 'byFour'],
                  index=['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9'])

mt

Unnamed: 0,byOne,byTwo,byThree,byFour
x1,1,2,3,4
x2,2,4,6,8
x3,3,6,9,12
x4,4,8,12,16
x5,5,10,15,20
x6,6,12,18,24
x7,7,14,21,28
x8,8,16,24,32
x9,9,18,27,36


We can convert the contents of a DataFrame to a NumPy array:

In [31]:
mt.values

array([[ 1,  2,  3,  4],
       [ 2,  4,  6,  8],
       [ 3,  6,  9, 12],
       [ 4,  8, 12, 16],
       [ 5, 10, 15, 20],
       [ 6, 12, 18, 24],
       [ 7, 14, 21, 28],
       [ 8, 16, 24, 32],
       [ 9, 18, 27, 36]])

We can get its column names:

In [32]:
mt.columns

Index(['byOne', 'byTwo', 'byThree', 'byFour'], dtype='object')

and its indexes:

In [33]:
mt.index

Index(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9'], dtype='object')

### Adding new columns and rows to an existing DataFrame
Given the DataFrame:

In [34]:
mt

Unnamed: 0,byOne,byTwo,byThree,byFour
x1,1,2,3,4
x2,2,4,6,8
x3,3,6,9,12
x4,4,8,12,16
x5,5,10,15,20
x6,6,12,18,24
x7,7,14,21,28
x8,8,16,24,32
x9,9,18,27,36


We can add a new column like this:

In [35]:
mt['byFive'] = 5 * np.arange(1, 10)

mt

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45


And using the `.loc`, we can add a new row like this:

In [36]:
mt.loc['x10'] = 10 * np.arange(1, 6)
mt

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45
x10,10,20,30,40,50


### Dropping a column or a row from a DataFrame
We can use the `drop` method to remove a column from a DataFrame. Let's remove the column `byFive` we added above:

In [37]:
mt.drop(['byFive'], axis=1)

Unnamed: 0,byOne,byTwo,byThree,byFour
x1,1,2,3,4
x2,2,4,6,8
x3,3,6,9,12
x4,4,8,12,16
x5,5,10,15,20
x6,6,12,18,24
x7,7,14,21,28
x8,8,16,24,32
x9,9,18,27,36
x10,10,20,30,40


Here `axis=1` refers to columns. To drop a row we use `axis=0` and provide a index instead of a column name.

In [38]:
mt.drop(['x10'], axis=0)

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45


### DataFrame indexing and slicing

We can use the column names and indexes to access individual cells.

In [39]:
mt['byTwo']['x5']

10

To extract a single column, use its name:

In [40]:
mt['byThree']

x1      3
x2      6
x3      9
x4     12
x5     15
x6     18
x7     21
x8     24
x9     27
x10    30
Name: byThree, dtype: int64

And use an array of column names to extract multiple columns

In [41]:
mt[['byTwo', 'byFour']]

Unnamed: 0,byTwo,byFour
x1,2,4
x2,4,8
x3,6,12
x4,8,16
x5,10,20
x6,12,24
x7,14,28
x8,16,32
x9,18,36
x10,20,40


Use `.loc` to access a specific row using its index:

In [42]:
mt.loc['x6']

byOne       6
byTwo      12
byThree    18
byFour     24
byFive     30
Name: x6, dtype: int64

Similarly we can extract rows using their index positions using `.iloc[]`. We can, for example extract the first row using the its index position `0` (the index itself is `x1`).

In [43]:
mt.iloc[0]

byOne      1
byTwo      2
byThree    3
byFour     4
byFive     5
Name: x1, dtype: int64

We can also use `.iloc` to access certain columns and rows based on their positions, much like the indexing and slicing of a two-dimensional NumPy array. Here is the whole DataFrame:

In [44]:
mt.iloc[:, :]

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45
x10,10,20,30,40,50


In the `[:, :]` expression, the first `:` refers to all rows and the second `:` refers to all columns.

Here is a slice from the third(position 2) row to the sixth (position 5) and from the second column (position 1) to the fifth column(position 4).

In [45]:
mt.iloc[2:5, 1:4]

Unnamed: 0,byTwo,byThree,byFour
x3,6,9,12
x4,8,12,16
x5,10,15,20


And here is every other column and row.

In [46]:
mt.iloc[::2, ::2]

Unnamed: 0,byOne,byThree,byFive
x1,1,3,5
x3,3,9,15
x5,5,15,25
x7,7,21,35
x9,9,27,45


Finally we can use the methods `head` and `tail` to display the first couple of rows at the top or the bottom of a DataFrame. This is useful for exploring large dataframes.

In [47]:
print(mt.head())
print(mt.head(6))

print(mt.tail())
print(mt.tail(7))

    byOne  byTwo  byThree  byFour  byFive
x1      1      2        3       4       5
x2      2      4        6       8      10
x3      3      6        9      12      15
x4      4      8       12      16      20
x5      5     10       15      20      25
    byOne  byTwo  byThree  byFour  byFive
x1      1      2        3       4       5
x2      2      4        6       8      10
x3      3      6        9      12      15
x4      4      8       12      16      20
x5      5     10       15      20      25
x6      6     12       18      24      30
     byOne  byTwo  byThree  byFour  byFive
x6       6     12       18      24      30
x7       7     14       21      28      35
x8       8     16       24      32      40
x9       9     18       27      36      45
x10     10     20       30      40      50
     byOne  byTwo  byThree  byFour  byFive
x4       4      8       12      16      20
x5       5     10       15      20      25
x6       6     12       18      24      30
x7       7     14       

### Summarizing dataframes
Given the following 50 by 10 DataFrame:

In [48]:
data = pd.DataFrame(np.random.randn(50,10), columns=list('ABCDEFGHIJ'))

In [49]:
data

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
0,0.426313,-0.787721,-1.224338,0.893432,-1.352372,0.388699,0.780838,0.161799,0.762249,0.935126
1,-0.992258,0.690064,-0.049687,-0.162374,1.182594,-0.384349,-0.860794,-1.303309,-0.24755,1.271908
2,-1.600852,-0.750423,-1.054071,-0.008069,-0.482367,0.103176,-0.087262,0.619026,-0.883828,0.714544
3,-2.085142,0.582091,1.393866,-0.736851,-0.746285,0.59225,-0.010729,-0.280606,-1.472615,-1.050276
4,-1.273573,-0.735513,0.278379,1.275783,0.566976,0.110183,1.615711,0.547422,2.623677,-0.718434
5,0.377197,0.268434,0.40256,0.817262,0.70298,0.573296,0.458199,-1.483077,-0.348366,-0.310876
6,1.706675,0.089115,1.272565,2.107148,-0.816521,1.192772,-0.296606,-1.576477,0.065436,0.67606
7,0.036817,1.554621,0.788182,-0.646296,-0.419567,1.282313,0.876022,0.843546,1.373736,-0.69019
8,-0.895977,0.339033,0.050729,-0.533551,2.379354,-0.689652,-1.285117,-1.707093,-0.252063,0.169974
9,0.421808,1.092169,-0.221153,-1.146687,0.03668,-0.241575,-1.126784,0.373656,-0.261116,0.531397


We can statistically summarize it like this:

In [50]:
data.describe()

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,0.107021,0.091077,0.165059,0.030971,0.003089,0.189486,0.059288,-0.148249,-0.064152,0.030649
std,0.952063,1.129493,0.879518,0.999614,0.963624,0.865338,0.851459,0.949466,1.129998,0.932004
min,-2.085142,-3.285639,-1.624731,-1.809766,-2.717593,-2.20173,-1.547584,-2.597858,-2.363694,-2.395816
25%,-0.452986,-0.765015,-0.239697,-0.802468,-0.522363,-0.275924,-0.638832,-0.782142,-0.717678,-0.711373
50%,0.05729,0.197372,0.093704,-0.080611,-0.087088,0.142972,-0.036931,-0.10005,-0.195109,0.066388
75%,0.869474,0.795641,0.885972,0.881767,0.617075,0.719096,0.747181,0.643673,0.724946,0.544164
max,2.853262,3.072399,2.155281,2.135799,2.379354,2.315286,1.929163,1.727941,3.172556,2.196478


We can also get the means of each column:

In [51]:
data.mean()

A    0.107021
B    0.091077
C    0.165059
D    0.030971
E    0.003089
F    0.189486
G    0.059288
H   -0.148249
I   -0.064152
J    0.030649
dtype: float64

and the variances of each column:

In [52]:
data.var()

A    0.906424
B    1.275755
C    0.773552
D    0.999229
E    0.928570
F    0.748811
G    0.724983
H    0.901486
I    1.276895
J    0.868631
dtype: float64

and the standard deviations of each column:

In [53]:
data.std()

A    0.952063
B    1.129493
C    0.879518
D    0.999614
E    0.963624
F    0.865338
G    0.851459
H    0.949466
I    1.129998
J    0.932004
dtype: float64

## Transposing DataFrames
We can transpose a DataFrame by making columns rows and rows columns. That means also reversing columns and indexes.

In [54]:
data.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49
A,0.426313,-0.992258,-1.600852,-2.085142,-1.273573,0.377197,1.706675,0.036817,-0.895977,0.421808,...,1.829465,0.212487,0.305061,1.200701,-0.086568,-1.029298,-0.456373,-0.34377,-0.088836,-0.31946
B,-0.787721,0.690064,-0.750423,0.582091,-0.735513,0.268434,0.089115,1.554621,0.339033,1.092169,...,1.084604,-1.250164,-1.384498,-0.778513,-0.098533,-0.244062,0.816332,1.010273,0.733567,3.072399
C,-1.224338,-0.049687,-1.054071,1.393866,0.278379,0.40256,1.272565,0.788182,0.050729,-0.221153,...,0.922062,-0.681394,0.244176,0.892624,1.39597,1.59604,-0.18848,0.360144,0.229747,0.903802
D,0.893432,-0.162374,-0.008069,-0.736851,1.275783,0.817262,2.107148,-0.646296,-0.533551,-1.146687,...,1.163527,-0.372703,-1.441362,0.002695,0.035968,-1.127986,-0.25374,-0.952976,0.594545,-0.90026
E,-1.352372,1.182594,-0.482367,-0.746285,0.566976,0.70298,-0.816521,-0.419567,2.379354,0.03668,...,0.036508,0.218476,-0.690596,0.843146,-0.096236,1.407054,-0.851245,-0.020469,0.262723,-0.376309
F,0.388699,-0.384349,0.103176,0.59225,0.110183,0.573296,1.192772,1.282313,-0.689652,-0.241575,...,0.487104,1.116526,0.163163,-0.258137,-2.20173,0.322942,-0.051668,1.052827,-1.804256,0.217875
G,0.780838,-0.860794,-0.087262,-0.010729,1.615711,0.458199,-0.296606,0.876022,-1.285117,-1.126784,...,-0.247985,-0.648365,-0.994253,-0.355217,-0.750337,0.067546,1.574738,-0.700469,1.432011,-0.135733
H,0.161799,-1.303309,0.619026,-0.280606,0.547422,-1.483077,-1.576477,0.843546,-1.707093,0.373656,...,0.174588,-1.395959,-1.226895,-0.614838,0.708286,0.735212,0.789795,-0.554859,-1.226823,0.836101
I,0.762249,-0.24755,-0.883828,-1.472615,2.623677,-0.348366,0.065436,1.373736,-0.252063,-0.261116,...,1.163103,0.613038,-1.37131,1.170115,-1.372964,0.1333,-0.04983,-1.617304,1.224584,-1.123775
J,0.935126,1.271908,0.714544,-1.050276,-0.718434,-0.310876,0.67606,-0.69019,0.169974,0.531397,...,-1.053624,0.110745,-0.458074,-0.21265,2.196478,0.891856,0.060698,-0.066278,-1.079608,0.240918


which is the same as:

In [55]:
data.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49
A,0.426313,-0.992258,-1.600852,-2.085142,-1.273573,0.377197,1.706675,0.036817,-0.895977,0.421808,...,1.829465,0.212487,0.305061,1.200701,-0.086568,-1.029298,-0.456373,-0.34377,-0.088836,-0.31946
B,-0.787721,0.690064,-0.750423,0.582091,-0.735513,0.268434,0.089115,1.554621,0.339033,1.092169,...,1.084604,-1.250164,-1.384498,-0.778513,-0.098533,-0.244062,0.816332,1.010273,0.733567,3.072399
C,-1.224338,-0.049687,-1.054071,1.393866,0.278379,0.40256,1.272565,0.788182,0.050729,-0.221153,...,0.922062,-0.681394,0.244176,0.892624,1.39597,1.59604,-0.18848,0.360144,0.229747,0.903802
D,0.893432,-0.162374,-0.008069,-0.736851,1.275783,0.817262,2.107148,-0.646296,-0.533551,-1.146687,...,1.163527,-0.372703,-1.441362,0.002695,0.035968,-1.127986,-0.25374,-0.952976,0.594545,-0.90026
E,-1.352372,1.182594,-0.482367,-0.746285,0.566976,0.70298,-0.816521,-0.419567,2.379354,0.03668,...,0.036508,0.218476,-0.690596,0.843146,-0.096236,1.407054,-0.851245,-0.020469,0.262723,-0.376309
F,0.388699,-0.384349,0.103176,0.59225,0.110183,0.573296,1.192772,1.282313,-0.689652,-0.241575,...,0.487104,1.116526,0.163163,-0.258137,-2.20173,0.322942,-0.051668,1.052827,-1.804256,0.217875
G,0.780838,-0.860794,-0.087262,-0.010729,1.615711,0.458199,-0.296606,0.876022,-1.285117,-1.126784,...,-0.247985,-0.648365,-0.994253,-0.355217,-0.750337,0.067546,1.574738,-0.700469,1.432011,-0.135733
H,0.161799,-1.303309,0.619026,-0.280606,0.547422,-1.483077,-1.576477,0.843546,-1.707093,0.373656,...,0.174588,-1.395959,-1.226895,-0.614838,0.708286,0.735212,0.789795,-0.554859,-1.226823,0.836101
I,0.762249,-0.24755,-0.883828,-1.472615,2.623677,-0.348366,0.065436,1.373736,-0.252063,-0.261116,...,1.163103,0.613038,-1.37131,1.170115,-1.372964,0.1333,-0.04983,-1.617304,1.224584,-1.123775
J,0.935126,1.271908,0.714544,-1.050276,-0.718434,-0.310876,0.67606,-0.69019,0.169974,0.531397,...,-1.053624,0.110745,-0.458074,-0.21265,2.196478,0.891856,0.060698,-0.066278,-1.079608,0.240918


### Sorting a DataFrame

We can sort indexes, column names, or the values of a DataFrame. For example, the following sorts the indexes (`axis=0`) of the DataFrame in an ascending order:

In [56]:
mt.sort_index(axis=0)

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x10,10,20,30,40,50
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45


The following sorts the columns (`axis=1`) of the DataFrame in a descending order:

In [57]:
mt.sort_index(axis=1, ascending=False)

Unnamed: 0,byTwo,byThree,byOne,byFour,byFive
x1,2,3,1,4,5
x2,4,6,2,8,10
x3,6,9,3,12,15
x4,8,12,4,16,20
x5,10,15,5,20,25
x6,12,18,6,24,30
x7,14,21,7,28,35
x8,16,24,8,32,40
x9,18,27,9,36,45
x10,20,30,10,40,50


Here is how to sort the values a given a column

In [58]:
mt.sort_values('byThree', ascending=False)

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x10,10,20,30,40,50
x9,9,18,27,36,45
x8,8,16,24,32,40
x7,7,14,21,28,35
x6,6,12,18,24,30
x5,5,10,15,20,25
x4,4,8,12,16,20
x3,3,6,9,12,15
x2,2,4,6,8,10
x1,1,2,3,4,5


and here is how to sort the values of a given row

In [59]:
mt.sort_values('x5', ascending=True, axis=1)

Unnamed: 0,byOne,byTwo,byThree,byFour,byFive
x1,1,2,3,4,5
x2,2,4,6,8,10
x3,3,6,9,12,15
x4,4,8,12,16,20
x5,5,10,15,20,25
x6,6,12,18,24,30
x7,7,14,21,28,35
x8,8,16,24,32,40
x9,9,18,27,36,45
x10,10,20,30,40,50


## EXERCISE

Recreate the 12 times table (see https://www.mathsisfun.com/tables.html) one column at a time. Start by creating an empty DataFrame representing the table. Use a series object to create the first column and add it to the table. Move on to the next column and do the same. And so on for the remaining columns. Use may use loops and expressions to calculate the values of these columns but no hard-coding of values and no code repetition.

In [60]:
import pandas as pd
import numpy as np
x=pd.Series(np.arange(1,13))
table = pd.DataFrame()
for i in range(1,13):
    table[i]=i*x
print("               12x12 Times Table:")
print(table)