# **Pandas Series Practice Excercises**


A pandas Series is a one-dimensional array. It holds any data type supported in Python and uses labels to locate each data value for retrieval. These labels form the index, and they can be strings or integers. A Series is the main data structure in the pandas framework for storing one-dimensional data.

**Common methods used with Pandas Series**


*.head(), .name(), .to_frame(), .reset_index(), isin(), .valuecounts(), .index(), .qcut(), .reshape(), .take(), .append(), .tolist(), .get_loc(), .map(), .diff()*


---


Please refer to the [Pandas Series Documentation](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) for a full list of methods and how to implement them. 

To complete the problems below we first need to import the Pandas and Numpy libraries. If you haven't already installed them, use pip or conda to install them globally or in your virtual environment.


* The common alias for pandas is pd.  
* The common alias for numpy is np.




In [2]:
#pip install pandas
#pip install numpy 

import pandas as pd
import numpy as np

**Problem 1**

We can easily convert lists, tuples, dictionaries and numpy arrays into a Pandas Series by directly passing the python object as an arguement to the Series() constructor. We have used the alias pd in the cell above, so we will use pd.Series() to construct our Series object. 



*  Convert the 3 objects below into 3 seperate Pandas Series objects.
*  Use the .head() method to see the first 5 items in each Series





In [3]:
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

ser1= pd.Series(mylist)
#ser1.head(20)

ser2= pd.Series(myarr)
#ser2.head()

ser3= pd.Series(mydict)
ser3.head(27)

a     0
b     1
c     2
e     3
d     4
f     5
g     6
h     7
i     8
j     9
k    10
l    11
m    12
n    13
o    14
p    15
q    16
r    17
s    18
t    19
u    20
v    21
w    22
x    23
y    24
z    25
dtype: int32

**Problem 2**

We can give a name to our Pandas series as a parameter of our Constructor method. 

*   Use the name parameter to add 'alphabet' as the name of our series.
*   Use the .head() method to see the name of our Series displayed



In [6]:
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'), name= 'alphabet')
print(ser)

0     a
1     b
2     c
3     e
4     d
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: alphabet, dtype: object


**Problem 3**

We can also add a name to the series after it is created using the .name() method on our Series object.



*   Use the .name attribute to add the name 'alphabet1' to the Series object after it has been created.
*   Use the .head() method to verify





In [6]:
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser.name= 'alphabet1'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabet1, dtype: object

**Problem 4**

We can use mathematic operators to perform calculations on two or more series.

*   Create a new object that adds the two series below
*   Create a new object that subtracts the two series below
*   Create a new object that multiplies the two series below
*   Create a new object that divides the two series below

Make sure you display your outputs.





In [18]:
ser1= pd.Series([2, 4, 6, 8, 10])
ser2= pd.Series([1, 3, 5, 7, 9])
#ser3_sum = sum(ser1,ser2)
ser4_add = ser1.add(ser2)
print(ser4_add)
ser5_sub = ser1.sub(ser2)
print(ser5_sub)
ser6_mul = ser1.mul(ser2)
print(ser6_mul)
ser7_div = ser1.div(ser2)
print(ser7_div)

0     3
1     7
2    11
3    15
4    19
dtype: int64
0    1
1    1
2    1
3    1
4    1
dtype: int64
0     2
1    12
2    30
3    56
4    90
dtype: int64
0    2.000000
1    1.333333
2    1.200000
3    1.142857
4    1.111111
dtype: float64


**Problem 5**

We can use comparison operators to compare elements of two Series.

Use the series below to perform the following:


*   Check if the elements in ser1 are equal to the elements in ser2
*   Check if the elements in ser1 are greater than the elements in ser2
*   Check if the elements in ser1 are less than the elements in ser2





In [22]:
ser1= pd.Series([2, 4, 5, 8, 10])
ser2= pd.Series([1, 3, 5, 7, 10])
eq_ser = ser1.eq(ser2)
print(eq_ser)
gt_ser = ser1.gt(ser2)
print(gt_ser)
lt_ser = ser1.lt(ser2)
print(lt_ser)

0    False
1    False
2     True
3    False
4     True
dtype: bool
0     True
1     True
2    False
3     True
4    False
dtype: bool
0    False
1    False
2    False
3    False
4    False
dtype: bool


**Problem 6**

We can add additional elements to our Pandas Series. Use the Series below to peform the following:


*   Create a new Series by using the .append() method to add 'penguin' and 'lion' to the Series object
*   Use the .reset_index() method to reset the index values. 

(Both steps can be done in one line)



In [239]:
ser= pd.Series(['cat', 'dog', 'elephant', 'frog'])
ser1= pd.Series(['penguin','lion'])
ser_concat = pd.concat([ser,ser1])#,ignore_index=True
#print(ser_concat)
#ser_concat.reset_index(drop=True)
print(ser_concat.reset_index(drop=True))

ser_append = ser._append(ser1) #Deprecated
#print(ser_append)
#ignore_index parameter vs resetIndex() ??

0         cat
1         dog
2    elephant
3        frog
4     penguin
5        lion
dtype: object


**Problem 7**

We can customize the index of our Series by using the 'index' paramater in our Series constructor. Use the Series below to peform the following:


*   Set the index to ['A', 'B', 'C', 'D']
*   Use the .reindex() method after the Series object is created to change the order of the index to ['B', 'A', 'D', 'C']



In [38]:
#ser= pd.Series(['cat', 'dog', 'elephant', 'frog'])
ser= pd.Series(['cat', 'dog', 'elephant', 'frog'], index=['A','B','C','D'])
print(ser)
ser = ser.reindex(['B','A','D','C'])
print(ser)

A         cat
B         dog
C    elephant
D        frog
dtype: object
B         dog
A         cat
D        frog
C    elephant
dtype: object


**Problem 8**

Find the elements in the first series (ser1) not in the second series (ser2).


*   Use the single square bracket notation and the .isin() method with ser1 to remove the items present in ser2. This will leave us with only the itmes not present in ser2.



In [72]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])
ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

**Problem 9**

Use the Series below to peform the following:
*   Use a built-in Pandas method to find the mean of the Series
*   Use a built-in Pandas method to find the standard deviation of the Series



In [74]:
ser= pd.Series(data = [1,2,3,4,5,6,7,8,9,5,3])
print(ser.mean())
print(ser.std())

4.818181818181818
2.522624895547565


**Problem 10**

Find the elements in both series that are not in common (remove them if they exist in both).

*   Create a new Series using np.union1d()
*   Create a new Series using np. intersect1d()
*   Use single square bracket notation and the .isin() method to find the final result





In [87]:
import numpy as np
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])
ser3 = pd.Series(np.union1d(ser1, ser2))
print(ser3)
ser4 = pd.Series(np.intersect1d(ser1, ser2))
print(ser4)
ser5 = ser3[~ser3.isin(ser4)]
print(ser5)

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64
0    4
1    5
dtype: int64
0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64


**Problem 11**

Find the minimum, max, 25th percentile, and 75th percentile in the series.

*   Use np.percentile to find the percentiles above





In [242]:
ser = pd.Series(np.random.normal(10, 5, 25))
#print(ser)

print("Min-",min(ser))
print("Max-",min(ser))
print("25th Percentile-",np.percentile(ser,25))
print("75th Percentile-",np.percentile(ser,75))

Min- 3.0946655127068556
Max- 3.0946655127068556
25th Percentile- 8.181462935842896
75th Percentile- 13.966514832788707


**Problem 12**

Obtain the count of each unique item in the series.


*   Use the .value_counts() method to find the frequency count of each unique item




In [243]:
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))
ser.value_counts()

d    6
f    5
c    4
b    4
h    4
g    3
a    3
e    1
Name: count, dtype: int64

**Problem 13**

Keep the two most frequent items in the series and change all items that are not those two into "Other".



*   Use .isin(), .valuecounts(), and .index() to complete the task above



In [128]:
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
print(ser)
ser_value_count = ser.value_counts().index[0:2]
print(ser_value_count)
ser[~ser.isin(ser_value_count)] = 'Other'
print(ser)

0     3
1     4
2     3
3     4
4     4
5     4
6     1
7     3
8     3
9     3
10    2
11    3
dtype: int32
Index([3, 4], dtype='int32')
0         3
1         4
2         3
3         4
4         4
5         4
6     Other
7         3
8         3
9         3
10    Other
11        3
dtype: object


**Problem 14**

Bin the series below into 10 equal deciles and replace the values with the bin name.

*   Use the .qcut() method to complete the task 




In [247]:
#ser = pd.Series(np.random.random(20))
ser = pd.Series([1,2,3,4,6,7,8,9,10])
print(ser)
pd.qcut(ser, 2)
# Didn't get this question and what to do with it?

0     1
1     2
2     3
3     4
4     6
5     7
6     8
7     9
8    10
dtype: int64


0    (0.999, 6.0]
1    (0.999, 6.0]
2    (0.999, 6.0]
3    (0.999, 6.0]
4    (0.999, 6.0]
5     (6.0, 10.0]
6     (6.0, 10.0]
7     (6.0, 10.0]
8     (6.0, 10.0]
dtype: category
Categories (2, interval[float64, right]): [(0.999, 6.0] < (6.0, 10.0]]

**Problem 15**

Find the positions of numbers that are multiples of 3 from Seriees below.

*  Use np.argwhere() to complete the task 




In [189]:
ser = pd.Series(np.random.randint(1, 10, 7))
print(ser)
print(np.argwhere(ser%3==0)) # Gives positions

0    6
1    3
2    9
3    5
4    5
5    1
6    9
dtype: int32
[[0]
 [1]
 [2]
 [6]]


**Problem 16**

From ser, extract the items at positions in list pos.

*   Use the .take() method to complete the task



In [132]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]
ser_new = ser.take(pos) # Gives values
print(ser_new)

0     a
4     e
8     i
14    o
20    u
dtype: object


**Problem 17**

Stack ser1 and ser2 vertically.

*   Use the .append() method to complete the task



In [249]:
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))
ser3 = pd.concat([ser1,ser2],axis=0).reset_index(drop=True)
print(ser3)

0    0
1    1
2    2
3    3
4    4
5    a
6    b
7    c
8    d
9    e
dtype: object


**Problem 18**

Get the positions of items of ser2 in ser1 as a list.

*   Use list comprehension, pd.Index, and .getloc() to complete the task



In [250]:
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])
ser1_index = pd.Index(ser1)
print(ser1_index)
ser3 = [ser1_index.get_loc(i) for i in ser2]
print(ser3)

Index([10, 9, 6, 5, 3, 1, 12, 8, 13], dtype='int64')
[5, 4, 0, 8]


**Problem 19**

Change the first character of each word to upper case in each word of ser.

*   Use the .map() method combined with a string method to complete the task




In [252]:
ser = pd.Series(['how', 'to', 'kick', 'butt?'])
#ser.map(lambda x:x.capitalize())
ser.map(str.capitalize)

0      How
1       To
2     Kick
3    Butt?
dtype: object

**Problem 20**

Calculate the number of characters for each element in the series.

*   Use the .map() method combined with a method to find the length of the string to complete the task.



In [152]:
ser = pd.Series(['how', 'to', 'kick', 'butt?'])
#ser.map(lambda x:len(x))
ser.map(len)

0    3
1    2
2    4
3    5
dtype: int64

**Problem 21**

Caluculate the difference of differences between the consequtive numbers of ser. Dislplay the output as a list.

*   Use the .diff() method to the complete the task



In [157]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
ser1 = [ser.diff(1)]
print(ser1)

[0    NaN
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    6.0
7    8.0
dtype: float64]


**Problem 22**

Convert the date-strings to a timeseries.

*   use the .to_datetime() method to complete the task


In [257]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
print(pd.to_datetime(ser,format="mixed")) # Why format = "mixed" needs to be there

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]


**Problem 23**

From dateutil.parser import the parse method to use for the following:

*   Create a new object using the .map() method to parse ser
*   Create a new list of all the days of the month using the .dt.day() methods from the parsed series
*   Create a new list of all the weeks of the year using the .dt.weekofyear() methods from the parsed series
*   Create a new list of all of the days of the year using the dt.dayofyear() methods from the parsed series
*   Create a new list of all the days of the week using the dt.weekday_name() methods from the parsed series





In [196]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
from dateutil.parser import parse

p = ser.map(lambda x:parse(x))
p_days_of_month = [p.dt.day]
p_weeks_of_year = [p.dt.isocalendar().week]
p_days_of_year = [p.dt.dayofyear]
p_weekday_name = [p.dt.day_name()]
print(p)
print(p_days_of_month)
print(p_weeks_of_year)
print(p_days_of_year)
print(p_weekday_name)

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]
[0    1
1    2
2    3
3    4
4    5
5    6
dtype: int32]
[0    53
1     5
2     9
3    14
4    19
5    23
Name: week, dtype: UInt32]
[0      1
1     33
2     63
3     94
4    125
5    157
dtype: int32]
[0       Friday
1    Wednesday
2     Saturday
3     Thursday
4       Monday
5     Saturday
dtype: object]


**Problem 24**

From ser, extract words that contain at least 2 vowels.

In [283]:
ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])
import re
#ser.map(lambda x:re.search(r"([aeiou].*){2}",x.lower()))
#match = ser.str.findall('([aeiou].*){2}?',flags=re.IGNORECASE)
#print(match)

#Solution 2
#vowels = list('aeiou')
#output = ser[ser.apply(lambda word: sum([1 for vowel in vowels if vowel in word.lower()]) >= 2)]
#print(output)

#Solution 3
twoMvowels = ser.str.lower().str.count('[aeiou]')
ser[twoMvowels>=2]

0     Apple
1    Orange
4     Money
dtype: object

**Problem 25**

Extract the valid emails from the series emails. The regex pattern for valid emails is provided as reference.

In [273]:
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'
emails.str.findall(pattern)

0                     []
1    [rameses@egypt.com]
2            [matt@t.co]
3    [narendra@modi.com]
dtype: object

**Problem 26**

Compute the mean of weights of each fruit.

In [272]:
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))

print(weights.groupby(fruit).mean())
#print(fruit_count)
#examples
#> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
#> ['banana', 'carrot', 'apple', 'carrot', 'carrot', 'apple', 'banana', 'carrot', 'apple', 'carrot']

apple     6.000000
banana    4.333333
carrot    6.000000
dtype: float64


**Problem 27**

Replace the spaces in my_str with the least frequent character.

In [301]:
my_str = 'dbc deb abed gade'
ser_my_str = pd.Series(list(my_str))
#print(ser_my_str.value_counts(ascending=True).index[0])
new_str = my_str.replace(' ',ser_my_str.value_counts(ascending=True).index[0])
print(new_str)


dbccdebcabedcgade


Problem 28

Create a timeseries starting at ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values.

Problem 29

Series ser has missing dates and values. Make all missing dates appear and fill up with value from previous date.

*   Use .resample() and .ffill() to complete the task




In [None]:
ser = pd.Series([1,10,3,np.nan], index=pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-06', '2000-01-08']))
print(ser)
#> 2000-01-01     1.0
#> 2000-01-03    10.0
#> 2000-01-06     3.0
#> 2000-01-08     NaN
#> dtype: float64