In [1]:
import numpy as np
import pandas as pd

---

### Basic data structures in pandas

1. `Series`: __a one-dimensional labeled array__ holding data of any type
    <br>such as integers, strings, Python objects etc.

2. `DataFrame`: __a two-dimensional data structure__ that holds data like a two-dimension array or __a table with rows and columns__.

---

### Object creation

In [4]:
# letting pandas create a default RangeIndex
s = pd.Series([1, 2, 3, np.nan, 5, 6])

In [5]:
s

0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    6.0
dtype: float64

---

`Creating a DataFrame` by passing a NumPy array with a datetime index using `pd.date_range()` and labeled columns:

In [26]:
# default frequency - days  

pd.date_range('2013-01-01', periods=5)

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05'],
              dtype='datetime64[ns]', freq='D')

In [27]:
# If you want the first day of each month, use the frequency 'MS' (Month Start):

pd.date_range('2013-01-01', periods=5, freq='MS')

DatetimeIndex(['2013-01-01', '2013-02-01', '2013-03-01', '2013-04-01',
               '2013-05-01'],
              dtype='datetime64[ns]', freq='MS')

In [28]:
# 'ME' stands for Month End frequency.

pd.date_range('2013-01-01', periods=5, freq='ME')

DatetimeIndex(['2013-01-31', '2013-02-28', '2013-03-31', '2013-04-30',
               '2013-05-31'],
              dtype='datetime64[ns]', freq='ME')

---

`np.random.randn()`
- Return a sample (or samples) from the `"standard normal" distribution`.
- random floats sampled from `a univariate "normal" (Gaussian) 
distribution of mean 0 and variance 1 `.

In [31]:
df = pd.DataFrame(np.random.randn(6, 4), 
                  index=pd.date_range('20130101', periods=6),
                  columns=list("ABCD")
                 )

In [32]:
df

Unnamed: 0,A,B,C,D
2013-01-01,-0.539466,-0.273859,-2.026441,-1.091532
2013-01-02,0.365958,0.701259,-0.782492,0.892827
2013-01-03,0.156535,0.896326,-0.487702,0.46839
2013-01-04,-0.697194,-1.197822,0.893838,0.535906
2013-01-05,-0.744688,0.70566,-0.099809,-1.908582
2013-01-06,-1.12891,0.247238,-0.456195,-0.496795


In [36]:
df.loc['2013-01-01']

A   -0.539466
B   -0.273859
C   -2.026441
D   -1.091532
Name: 2013-01-01 00:00:00, dtype: float64

---

`Creating a DataFrame` by passing `a dictionary` of objects where the keys are the column labels and the values are the column values.



In [68]:
df2 = pd.DataFrame(
    {
        'A': 1.0,
        'B': pd.Timestamp('2013-01-01'),
        'C': pd.Series(1, index=list(range(4)), dtype='float32'),
        'D': np.array([3] * 4, dtype='int32'),
        'E': pd.Categorical(['test', 'train', 'test', 'train']),
        'F': 'foo'
    }
)

In [69]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-01,1.0,3,test,foo
1,1.0,2013-01-01,1.0,3,train,foo
2,1.0,2013-01-01,1.0,3,test,foo
3,1.0,2013-01-01,1.0,3,train,foo


In [70]:
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

In [71]:
df2.min()

TypeError: Categorical is not ordered for operation min
you can use .as_ordered() to change the Categorical to an ordered one


- `datetime` (from the built-in `datetime module`)
- `Timestamp` (from `pandas`)


In [42]:
type(pd.Timestamp('2013-01-01'))

pandas._libs.tslibs.timestamps.Timestamp

In [None]:
pd.to_

In [47]:
np.array([3] * 4)

array([3, 3, 3, 3])

In [49]:
[3] * 4 # list and not array is multiplied

[3, 3, 3, 3]

In [51]:
np.array([3]) * 4

array([12])

---

In [53]:
pd.Series([1, 'a', 4])

0    1
1    a
2    4
dtype: object

In [60]:
pd.Series([1, 'a', 4]).loc[0]

1

In [62]:
[2, 3] * 3

[2, 3, 2, 3, 2, 3]

In [None]:
df2.