**02: Introduction to Pandas Series and dataypes.**


In [6]:
import pandas as pd
import numpy as np

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, python object, etc.). 


*Syntax:*  `pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=False)`\
*Most commonly used syntax:* `pandas.Series(data)` or `pandas.Series(data, index)`.

**None => optional*

0. Create an empty series

In [4]:
s = pd.Series()
print(s)

Series([], dtype: object)


1. Create a series using a list or tuple

In [33]:
data = [1, 2, 3, 4, 5]
print(s)
data = (1, 2, 3, 4, 5)
# data = {1, 2, 3, 4, 5} #<-- this banned coz set is unordered
s = pd.Series(data)
s

0    1
1    2
2    3
3    4
4    5
dtype: int64


0    1
1    2
2    3
3    4
4    5
dtype: int64

2. Create a series using `ndarray` (Numpy array)

In [30]:
data = np.array([1,2.0,3,4,5])
s = pd.Series(data)
s 


0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

3. Compare it with DataFrame created by the same data

In [35]:
data = np.array([1,2,3,4,5])
data = [1,2,3,4,5]
df = pd.DataFrame(data)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


*Note: it creates a default column name `0`.*

4. Check types of Series and DataFrame

In [57]:
print(s)
s.dtype

a     apple
b    banana
c    cherry
d      date
dtype: object


dtype('O')

In [24]:
print(df)
df.dtypes

   0
0  1
1  2
2  3
3  4
4  5


0    int64
dtype: object

*Let's dive deep into the Series structure, let's keep diverse items inside it.*

In [62]:
s = pd.Series([1.0,100,pd.Timestamp('20230922'),np.nan,'foo'])
s

0                    1.0
1                    100
2    2023-09-22 00:00:00
3                    NaN
4                    foo
dtype: object

In [59]:
s.dtypes 

dtype('O')

*Let's dive deep into the DataFrame, let's keep diverse items inside it.*

In [49]:
df = pd.DataFrame({'float': [1.0,2],
                   'int': [100,200],
                   'datetime': [pd.Timestamp('20230922'),pd.Timestamp('20220922')],
                   'NaN': [np.nan, 2*np.nan],
                   'string': ['foo','bar']})
df

Unnamed: 0,float,int,datetime,NaN,string
0,1.0,100,2023-09-22,,foo
1,2.0,200,2022-09-22,,bar


In [38]:
df.dtypes

float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object

5. Create a Series by dictionary. Note here keys will replace the default indices.

In [50]:
data = {'a':'apple', 'b':'banana', 'c':'cherry', 'd':'date'}
s = pd.Series(data)
s

a     apple
b    banana
c    cherry
d      date
dtype: object

6. We can do the same by using an array or list and defining `index` list separately.

In [51]:
data = ['apple', 'banana', 'cherry', 'date']
index = ['a', 'b', 'c', 'd']
s = pd.Series(data,index)
s

a     apple
b    banana
c    cherry
d      date
dtype: object

*In one shot:*

In [54]:
s = pd.Series(data = ['apple', 'banana', 'cherry', 'date'], index = ['a', 'b', 'c', 'd'])
s

a     apple
b    banana
c    cherry
d      date
dtype: object

*Or we can be more implicit (`data =` and `index =` parts can be omitted)*

In [55]:
s = pd.Series(['apple', 'banana', 'cherry', 'date'], ['a', 'b', 'c', 'd'])
s 

a     apple
b    banana
c    cherry
d      date
dtype: object

7. Create series using a scalar (constant).

In [64]:
s = pd.Series('apple', index=[10, 11, 12, 13, 14, 15])
s


10    apple
11    apple
12    apple
13    apple
14    apple
15    apple
dtype: object