# <center>Operation and Aggregation</center>

* [<span style='color:purple'>Numerical series operation](#Numerical-series-operation)
* [<span style='color:purple'>Text series operations](#Text-series-operations)
* [<span style='color:purple'>Numerical series aggregation](#Numerical-series-aggregation)
* [<span style='color:purple'>Categorical series aggregation](#Categorical-series-aggregation)

You can use operators and methods to perform numeric operation in series.

In [1]:
import pandas as pd
import numpy as np

## Numerical series operation

In [4]:
saturday_sales =pd.Series(range(10,100,12))

In [5]:
saturday_sales

0    10
1    22
2    34
3    46
4    58
5    70
6    82
7    94
dtype: int64

In [6]:
saturday_sales *2

0     20
1     44
2     68
3     92
4    116
5    140
6    164
7    188
dtype: int64

In [30]:
dollar_monday ='$'+ monday_sales.astype("float").astype("string")

In [10]:
series_number = pd.Series([1,3,np.NAN],index =['day1','day2','day3'])

In [11]:
series_number

day1    1.0
day2    3.0
day3    NaN
dtype: float64

In [12]:
series_number *4

day1     4.0
day2    12.0
day3     NaN
dtype: float64

In [22]:
mean = series_number.mean()

In [24]:

series2 = series_number.mul(4,fill_value=mean)

In [25]:
series2

day1     4.0
day2    12.0
day3     8.0
dtype: float64

In [26]:
series2/2

day1    2.0
day2    6.0
day3    4.0
dtype: float64

In [28]:
#we can add two series together
series_number + series2

day1     5.0
day2    15.0
day3     NaN
dtype: float64

## Text series operations

Pandas str accessor let you access many string method. Some of string methods are:
* .strip(),.lstrip(),.rstrip()
* .upper(),.lower()
* .slice(start:stop:slice)
* .count()

In [31]:
dollar_monday

0    $10.0
1    $22.0
2    $34.0
3    $46.0
4    $58.0
5    $70.0
6    $82.0
7    $94.0
dtype: string

In [32]:
dollar_monday.str.strip("$").astype("float")

0    10.0
1    22.0
2    34.0
3    46.0
4    58.0
5    70.0
6    82.0
7    94.0
dtype: float64

In [33]:
series_number


day1    1.0
day2    3.0
day3    NaN
dtype: float64

In [41]:
string_series = pd.Series(['monday','tuesday','thursday','friday'])

In [42]:
string_series

0      monday
1     tuesday
2    thursday
3      friday
dtype: object

In [50]:
x =string_series.str.strip('day')
x

0      mon
1     tues
2    thurs
3      fri
dtype: object

In [49]:
x.str.upper()

0      MON
1     TUES
2    THURS
3      FRI
dtype: object

In [52]:
string2 =pd.Series(['monday lunch','friday breakfast','saturday snacks'])

In [53]:
string2

0        monday lunch
1    friday breakfast
2     saturday snacks
dtype: object

In [55]:
split_string2 = string2.str.split(" ",expand=True)

In [56]:
split_string2

Unnamed: 0,0,1
0,monday,lunch
1,friday,breakfast
2,saturday,snacks


## Numerical series aggregation


In [59]:
inventory =pd.Series(range(11,70,7))

In [60]:
inventory

0    11
1    18
2    25
3    32
4    39
5    46
6    53
7    60
8    67
dtype: int64

In [62]:
inventory.index=['coffee','sugar','cocoa','cake','juice','coffee','sugar','cocoa','cake']

In [63]:
inventory

coffee    11
sugar     18
cocoa     25
cake      32
juice     39
coffee    46
sugar     53
cocoa     60
cake      67
dtype: int64

In [65]:
inventory.loc['coffee'].sum()

57

In [66]:
inventory.loc['cocoa'].mean()

42.5

In [67]:
inventory.max()

67

In [69]:
inventory.quantile([.25,.50,.75])

0.25    25.0
0.50    39.0
0.75    53.0
dtype: float64

In [72]:
transaction_df= pd.read_csv("transactions.csv")

In [73]:
transaction_df

Unnamed: 0,date,store_nbr,transactions
0,2013-01-01,25,770
1,2013-01-02,1,2111
2,2013-01-02,2,2358
3,2013-01-02,3,3487
4,2013-01-02,4,1922
...,...,...,...
83483,2017-08-15,50,2804
83484,2017-08-15,51,1573
83485,2017-08-15,52,2255
83486,2017-08-15,53,932


In [74]:
transaction_series = pd.Series(transaction_df["transactions"])

In [75]:
transaction_series

0         770
1        2111
2        2358
3        3487
4        1922
         ... 
83483    2804
83484    1573
83485    2255
83486     932
83487     802
Name: transactions, Length: 83488, dtype: int64

In [76]:
#top 5 rows in series
transaction_series[:5]

0     770
1    2111
2    2358
3    3487
4    1922
Name: transactions, dtype: int64

In [78]:
transaction_series.count()

83488

In [79]:
transaction_series.mean()

1694.6021583940208

In [82]:
transaction_series.quantile([0.25,0.5,0.75])

0.25    1046.0
0.50    1393.0
0.75    2079.0
Name: transactions, dtype: float64

In [83]:
transaction_series.median()

1393.0

In [84]:
transaction_series.min()

5

In [85]:
transaction_series.max()

8359

## Categorical series aggregation

In [91]:
inventory_list =pd.Series(['coffee', 'sugar', 'cocoa', 'cake', 'juice', 'coffee', 'sugar', 'cocoa',
       'cake'])

In [92]:
inventory_list.unique()

array(['coffee', 'sugar', 'cocoa', 'cake', 'juice'], dtype=object)

In [93]:
inventory_list.nunique()

5

In [94]:
inventory_list.value_counts()

coffee    2
sugar     2
cocoa     2
cake      2
juice     1
dtype: int64

In [96]:
inventory_list.value_counts(normalize=True)

coffee    0.222222
sugar     0.222222
cocoa     0.222222
cake      0.222222
juice     0.111111
dtype: float64