# **Pandas Series Practice Excercises**


A pandas Series is a one-dimensional array. It holds any data type supported in Python and uses labels to locate each data value for retrieval. These labels form the index, and they can be strings or integers. A Series is the main data structure in the pandas framework for storing one-dimensional data.

**Common methods used with Pandas Series**


*.head(), .name(), .to_frame(), .reset_index(), isin(), .valuecounts(), .index(), .qcut(), .reshape(), .take(), .append(), .tolist(), .get_loc(), .map(), .diff()*


---


Please refer to the [Pandas Series Documentation](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) for a full list of methods and how to implement them. 

To complete the problems below we first need to import the Pandas and Numpy libraries. If you haven't already installed them, use pip or conda to install them globally or in your virtual environment.


* The common alias for pandas is pd.  
* The common alias for numpy is np.




In [21]:
#pip install pandas
#pip install numpy 

import pandas as pd
import numpy as np

**Problem 1**

We can easily convert lists, tuples, dictionaries and numpy arrays into a Pandas Series by directly passing the python object as an arguement to the Series() constructor. We have used the alias pd in the cell above, so we will use pd.Series() to construct our Series object. 



*  Convert the 3 objects below into 3 seperate Pandas Series objects.
*  Use the .head() method to see the first 5 items in each Series





In [28]:
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

ser1= pd.Series(mylist)
print(ser1.head(5))

ser2= pd.Series(myarr)
print(ser2.head(5))

ser3= pd.Series(mydict)
print(ser3.head())

0    a
1    b
2    c
3    e
4    d
dtype: object
0    0
1    1
2    2
3    3
4    4
dtype: int32
a    0
b    1
c    2
e    3
d    4
dtype: int32


**Problem 2**

We can give a name to our Pandas series as a parameter of our Constructor method. 

*   Use the name parameter to add 'alphabet' as the name of our series.
*   Use the .head() method to see the name of our Series displayed



In [29]:
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'), name= 'alphabet')
print(ser.head())

0    a
1    b
2    c
3    e
4    d
Name: alphabet, dtype: object


**Problem 3**

We can also add a name to the series after it is created using the .name() method on our Series object.



*   Use the .name attribute to add the name 'alphabet1' to the Series object after it has been created.
*   Use the .head() method to verify





In [None]:
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser.name= 'alphabet1'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabet1, dtype: object

**Problem 4**

We can use mathematic operators to perform calculations on two or more series.

*   Create a new object that adds the two series below
*   Create a new object that subtracts the two series below
*   Create a new object that multiplies the two series below
*   Create a new object that divides the two series below

Make sure you display your outputs.





In [35]:
ser1= pd.Series([2, 4, 6, 8, 10])
ser2= pd.Series([1, 3, 5, 7, 9])
print(ser1.add(ser2,fill_value=0))
print(ser1.subtract(ser2,fill_value=0))
print(ser1.multiply(ser2,fill_value=0))
print(ser1.divide(ser2,fill_value=0))



0     3
1     7
2    11
3    15
4    19
dtype: int64
0    1
1    1
2    1
3    1
4    1
dtype: int64
0     2
1    12
2    30
3    56
4    90
dtype: int64
0    2.000000
1    1.333333
2    1.200000
3    1.142857
4    1.111111
dtype: float64


**Problem 5**

We can use comparison operators to compare elements of two Series.

Use the series below to perform the following:


*   Check if the elements in ser1 are equal to the elements in ser2
*   Check if the elements in ser1 are greater than the elements in ser2
*   Check if the elements in ser1 are less than the elements in ser2





In [40]:
ser1= pd.Series([2, 4, 5, 8, 10])
ser2= pd.Series([1, 3, 5, 7, 10])
result = ser1.equals(ser2)
print(result)
result1 = ser1.gt(ser2)
print(result1)
result2 = ser1.lt(ser2)
print(result2)


False
0     True
1     True
2    False
3     True
4    False
dtype: bool
0    False
1    False
2    False
3    False
4    False
dtype: bool


**Problem 6**

We can add additional elements to our Pandas Series. Use the Series below to peform the following:


*   Create a new Series by using the .append() method to add 'penguin' and 'lion' to the Series object
*   Use the .reset_index() method to reset the index values. 

(Both steps can be done in one line)



In [79]:
ser=pd.Series(['cat', 'dog', 'elephant', 'frog'])
ser1 = pd.Series(['penguin','lion'])
#new_ser = ser.append(ser1)
ser_append = pd.concat([ser, ser1])
s1 = ser_append.reset_index(drop=True)
print(s1)

0         cat
1         dog
2    elephant
3        frog
4     penguin
5        lion
dtype: object


**Problem 7**

We can customize the index of our Series by using the 'index' paramater in our Series constructor. Use the Series below to peform the following:


*   Set the index to ['A', 'B', 'C', 'D']
*   Use the .reindex() method after the Series object is created to change the order of the index to ['B', 'A', 'D', 'C']



In [92]:
ser=pd.Series(['cat', 'dog', 'elephant', 'frog'])
ser.index = (['A', 'B','C','D'])
print(ser)
s1 = ser.reindex(['B','A','C','D'])
print(s1)

A         cat
B         dog
C    elephant
D        frog
dtype: object
B         dog
A         cat
C    elephant
D        frog
dtype: object


**Problem 8**

Find the elements in the first series (ser1) not in the second series (ser2).


*   Use the single square bracket notation and the .isin() method with ser1 to remove the items present in ser2. This will leave us with only the itmes not present in ser2.



In [90]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])
res = ser1[~ser1.isin(ser2)]
print(res)


0    1
1    2
2    3
dtype: int64


**Problem 9**

Use the Series below to peform the following:
*   Use a built-in Pandas method to find the mean of the Series
*   Use a built-in Pandas method to find the standard deviation of the Series



In [91]:
ser= pd.Series(data = [1,2,3,4,5,6,7,8,9,5,3])
print(ser.mean(),ser.std())

4.818181818181818 2.522624895547565


**Problem 10**

Find the elements in both series that are not in common (remove them if they exist in both).

*   Create a new Series using np.union1d()
*   Create a new Series using np. intersect1d()
*   Use single square bracket notation and the .isin() method to find the final result





In [None]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

**Problem 11**

Find the minimum, max, 25th percentile, and 75th percentile in the series.

*   Use np.percentile to find the percentiles above





In [93]:
ser = pd.Series(np.random.normal(10, 5, 25))
print(ser)
print("Min is",ser.min())
print("Max is",ser.max())
print("25th percentile is",np.percentile(ser,25))
print("75th percentile is",np.percentile(ser,75))


0     19.211077
1      9.652195
2      8.891362
3     11.801340
4     16.383624
5      5.472696
6      8.380667
7     10.233639
8      9.725180
9     14.245067
10     9.164407
11     5.122169
12     6.143351
13    16.853259
14    14.853762
15    12.274285
16    11.566249
17    15.596007
18    12.775057
19     5.270422
20    10.014686
21     0.313410
22    18.362134
23    12.150616
24    15.467452
dtype: float64
Min is 0.3134099263184993
Max is 19.211076907669838
25th percentile is 8.891361974649975
75th percentile is 14.853761649278695


**Problem 12**

Obtain the count of each unique item in the series.


*   Use the .value_counts() method to find the frequency count of each unique item




In [97]:
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))
print(ser)
result = ser.value_counts()
print(result)

0     a
1     g
2     b
3     b
4     h
5     c
6     f
7     c
8     d
9     f
10    e
11    h
12    c
13    d
14    c
15    d
16    d
17    c
18    a
19    b
20    g
21    g
22    e
23    d
24    g
25    g
26    h
27    h
28    h
29    a
dtype: object
g    5
h    5
c    5
d    5
a    3
b    3
f    2
e    2
Name: count, dtype: int64


**Problem 13**

Keep the two most frequent items in the series and change all items that are not those two into "Other".



*   Use .isin(), .valuecounts(), and .index() to complete the task above



In [101]:
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
print(ser)
top_two=ser.value_counts().index[:2]
print(top_two)
ser[~ser.isin(top_two)] = 'Other'
print(ser)

0     2
1     4
2     3
3     1
4     1
5     3
6     3
7     1
8     2
9     1
10    2
11    4
dtype: int32
Index([1, 2], dtype='int32')
0         2
1     Other
2     Other
3         1
4         1
5     Other
6     Other
7         1
8         2
9         1
10        2
11    Other
dtype: object


**Problem 14**

Bin the series below into 10 equal deciles and replace the values with the bin name.

*   Use the .qcut() method to complete the task 




In [104]:
ser = pd.Series(np.random.random(20))
ser1 = pd.qcut(ser,10)
print(ser1)

0      (0.266, 0.332]
1      (0.695, 0.807]
2      (0.598, 0.695]
3      (0.807, 0.959]
4     (0.0474, 0.112]
5      (0.373, 0.504]
6      (0.504, 0.598]
7      (0.266, 0.332]
8      (0.332, 0.373]
9     (0.0474, 0.112]
10     (0.112, 0.206]
11     (0.332, 0.373]
12     (0.206, 0.266]
13     (0.807, 0.959]
14     (0.598, 0.695]
15     (0.373, 0.504]
16     (0.206, 0.266]
17     (0.504, 0.598]
18     (0.112, 0.206]
19     (0.695, 0.807]
dtype: category
Categories (10, interval[float64, right]): [(0.0474, 0.112] < (0.112, 0.206] < (0.206, 0.266] < (0.266, 0.332] ... (0.504, 0.598] < (0.598, 0.695] < (0.695, 0.807] < (0.807, 0.959]]


**Problem 15**

Find the positions of numbers that are multiples of 3 from Seriees below.

*  Use np.argwhere() to complete the task 




In [108]:
import numpy as np
ser = pd.Series(np.random.randint(1, 10, 7))
print(ser)
np.argwhere(ser.to_numpy()%3 ==0 )

0    5
1    3
2    4
3    9
4    8
5    5
6    2
dtype: int32


array([[1],
       [3]], dtype=int64)

**Problem 16**

From ser, extract the items at positions in list pos.

*   Use the .take() method to complete the task



In [109]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]
ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

**Problem 17**

Stack ser1 and ser2 vertically.

*   Use the .append() method to complete the task



In [110]:
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))
vert_ser = pd.concat([ser1,ser2],ignore_index= True)
print(ser,ser1,vert_ser)

0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
dtype: object 0    0
1    1
2    2
3    3
4    4
dtype: int64 0    0
1    1
2    2
3    3
4    4
5    a
6    b
7    c
8    d
9    e
dtype: object


**Problem 18**

Get the positions of items of ser2 in ser1 as a list.

*   Use list comprehension, pd.Index, and .getloc() to complete the task



In [None]:
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])
ser1_index= pd.Index(ser1)
new_list = ser1_index.get_loc(i)

**Problem 19**

Change the first character of each word to upper case in each word of ser.

*   Use the .map() method combined with a string method to complete the task




In [46]:
ser = pd.Series(['how', 'to', 'kick', 'butt?'])
ser_map = ser.map(str.capitalize)
print(ser_map)

0      How
1       To
2     Kick
3    Butt?
dtype: object


**Problem 20**

Calculate the number of characters for each element in the series.

*   Use the .map() method combined with a method to find the length of the string to complete the task.



In [50]:
ser = pd.Series(['how', 'to', 'kick', 'butt?'])
ser_len = ser.map(len)
print(ser_len)

0    3
1    2
2    4
3    5
dtype: int64


**Problem 21**

Caluculate the difference of differences between the consequtive numbers of ser. Dislplay the output as a list.

*   Use the .diff() method to the complete the task



In [None]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

**Problem 22**

Convert the date-strings to a timeseries.

*   use the .to_datetime() method to complete the task


In [57]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
ser_date = pd.to_datetime(ser,format="mixed")
print(ser_date)

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]


**Problem 23**

From dateutil.parser import the parse method to use for the following:

*   Create a new object using the .map() method to parse ser
*   Create a new list of all the days of the month using the .dt.day() methods from the parsed series
*   Create a new list of all the weeks of the year using the .dt.weekofyear() methods from the parsed series
*   Create a new list of all of the days of the year using the dt.dayofyear() methods from the parsed series
*   Create a new list of all the days of the week using the dt.weekday_name() methods from the parsed series





In [49]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
p = ser.map(lambda x:parse(x))
p_days_of_month = [p.dt.day]
p_weeks_of_year = [p.dt.isocalendar().week]
p_days_of_year = [p.dt.dayofyear]
p_weekday_name = [p.dt.day_name()]

NameError: name 'parse' is not defined

**Problem 24**

From ser, extract words that contain at least 2 vowels.

In [52]:
ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])
vowels = list('aeiou')
#output = ser[ser.apply(lambda word: sum([1 for vowel in vowels if vowel in word.lower()]) >= 2)]
#print(output)
twoMvowels = ser.str.lower().str.count('[aeiou]')
ser[twoMvowels>=2]

0     Apple
1    Orange
4     Money
dtype: object

**Problem 25**

Extract the valid emails from the series emails. The regex pattern for valid emails is provided as reference.

In [None]:
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'


**Problem 26**

Compute the mean of weights of each fruit.

In [60]:
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))
weights.index=fruit
print(weights.mean())
#print(weight.tolist())
#print(fruit.tolist())
#examples
#> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
#> ['banana', 'carrot', 'apple', 'carrot', 'carrot', 'apple', 'banana', 'carrot', 'apple', 'carrot']

5.5


**Problem 27**

Replace the spaces in my_str with the least frequent character.

In [None]:
my_str = 'dbc deb abed gade'

Problem 28

Create a timeseries starting at ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values.

In [61]:
indx = pd.date_range(start = '2000-01-01', periods = 10, freq = '7D')
vals = np.random.rand(10)
pd.Series(vals, index = indx)

2000-01-01    0.104820
2000-01-08    0.059012
2000-01-15    0.599987
2000-01-22    0.958355
2000-01-29    0.589985
2000-02-05    0.676544
2000-02-12    0.007458
2000-02-19    0.214179
2000-02-26    0.260809
2000-03-04    0.600870
Freq: 7D, dtype: float64

Problem 29

Series ser has missing dates and values. Make all missing dates appear and fill up with value from previous date.

*   Use .resample() and .ffill() to complete the task




In [None]:
ser = pd.Series([1,10,3,np.nan], index=pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-06', '2000-01-08']))
print(ser)
#> 2000-01-01     1.0
#> 2000-01-03    10.0
#> 2000-01-06     3.0
#> 2000-01-08     NaN
#> dtype: float64