### 1. How to import pandas and check the version?

##### solution:

In [73]:
import pandas as pd
import numpy as np

### 2. How to create a series from a list, numpy array and dict?

##### Input:

In [3]:
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

##### solution:

In [4]:
ser1=pd.Series(mylist)
ser2=pd.Series(myarr)
ser3=pd.Series(mydict)


### 3. How to convert the index of a series into a column of a dataframe?

##### Input:

##### solution:

In [6]:
df=ser3.to_frame().reset_index()

### 4. How to combine many series to form a dataframe?

##### Input:

In [7]:
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

##### solution:

In [10]:
df=pd.DataFrame({'col1': ser1, 'col2': ser2})
df=pd.concat([ser1,ser2], axis=1)
df

Unnamed: 0,0,1
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4
5,f,5
6,g,6
7,h,7
8,i,8
9,j,9


### 5. How to assign name to the series’ index?

##### Input:

In [12]:
# Give a name to the series ser calling it ‘alphabets’.

ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

##### solution:

In [15]:
ser=ser.rename('alphabet')
ser.name='alphabets'
ser

0     a
1     b
2     c
3     e
4     d
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: alphabets, dtype: object

### 6. How to get the items of series A not present in series B?

##### Input:

In [18]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

##### solution:

In [19]:
ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

### 7. How to get the items not common to both series A and series B?

##### Input:

##### solution:

In [23]:
s1=ser1[~ser1.isin(ser2)]
s2=ser2[~ser2.isin(ser1)]
s3=pd.concat([s1,s2],axis=0)
s3

0    1
1    2
2    3
2    6
3    7
4    8
dtype: int64

### 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

##### Input:

In [29]:
ser = pd.Series(np.random.normal(10, 5, 25))
ser

0     -0.972182
1      8.648608
2     13.944948
3      5.071427
4     11.417400
5      6.985468
6     16.367837
7      8.117317
8      9.531630
9     11.045474
10     9.603233
11    11.786360
12     6.567174
13     5.997962
14     4.831721
15     4.258047
16     6.936236
17    19.825673
18    18.647195
19    14.150069
20     5.718198
21    15.625295
22    19.038119
23    11.108401
24    13.378535
dtype: float64

##### solution:

In [38]:
quar=ser.quantile(1)
quar

19.82567292148692

### 9. How to get frequency counts of unique items of a series?

##### Input:

In [39]:
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

##### solution:

In [43]:
# ofcourse we can create a dict and do it
# but probably there is a function to do it in a better way
ser.value_counts(sort=True)

h    7
e    4
c    4
g    4
b    4
f    3
d    3
a    1
Name: count, dtype: int64

### 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

##### Input:

In [60]:
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))

##### solution:

In [61]:
ser

0     4
1     2
2     1
3     4
4     2
5     2
6     4
7     1
8     2
9     4
10    1
11    1
dtype: int64

In [62]:
ser[~ser.isin(ser.value_counts(sort=True).index[:2])]='others'
ser

  ser[~ser.isin(ser.value_counts(sort=True).index[:2])]='others'


0          4
1          2
2     others
3          4
4          2
5          2
6          4
7     others
8          2
9          4
10    others
11    others
dtype: object

### 11. How to bin a numeric series to 10 groups of equal size?

##### Input:

In [63]:
ser = pd.Series(np.random.random(20))

##### solution:

In [65]:
pd.qcut(ser,10, labels=np.arange(10))

0     0
1     9
2     5
3     6
4     0
5     5
6     4
7     8
8     4
9     3
10    2
11    6
12    1
13    1
14    2
15    9
16    7
17    3
18    8
19    7
dtype: category
Categories (10, int64): [0 < 1 < 2 < 3 ... 6 < 7 < 8 < 9]

### 12. How to convert a numpy array to a dataframe of given shape? (L1)

##### Input:

In [66]:
# Reshape the series ser into a dataframe with 7 rows and 5 columns

ser = pd.Series(np.random.randint(1, 10, 35))

##### solution:

In [75]:
arr=np.reshape(ser.values,[7,5])
arr

array([[9, 5, 3, 6, 7],
       [7, 8, 2, 6, 9],
       [4, 4, 6, 5, 1],
       [6, 9, 9, 3, 7],
       [2, 8, 6, 5, 5],
       [9, 9, 3, 5, 4],
       [4, 3, 7, 2, 5]])

### 13. How to find the positions of numbers that are multiples of 3 from a series?

##### Input:

In [None]:
ser = pd.Series(np.random.randint(1, 10, 7))

##### solution:

In [80]:
ser[ser%3==0].index

Index([0, 2, 3, 8, 9, 12, 15, 16, 17, 18, 22, 25, 26, 27, 31], dtype='int64')

### 14. How to extract items at given positions from a series

##### Input:

In [82]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

##### solution:

In [84]:
ser[pos].values

array(['a', 'e', 'i', 'o', 'u'], dtype=object)

### 15. How to stack two series vertically and horizontally ?

##### Input:

In [85]:
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

##### solution:

In [86]:
ser3=pd.concat([ser1,ser2], axis=1)
ser4=pd.concat([ser1,ser2], axis=0)
print(ser3)
print(ser4)

   0  1
0  0  a
1  1  b
2  2  c
3  3  d
4  4  e
0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object


### 16. How to get the positions of items of series A in another series B?

##### Input:

In [89]:
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

##### solution:

In [90]:
ser1[ser1.isin(ser2)].index

Index([0, 4, 5, 8], dtype='int64')

### 17. How to compute the mean squared error on a truth and predicted series?

##### Input:

In [91]:
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

##### solution:

In [94]:
# truth 
pred

0    0.984018
1    1.258840
2    2.731631
3    3.116805
4    4.546428
5    5.949387
6    6.649334
7    7.926292
8    8.396566
9    9.015606
dtype: float64

In [105]:
MSE=np.square(truth-pred).mean()
MSE

0.42212966655501943

### 18. How to convert the first character of each element in a series to uppercase?

##### Input:

In [111]:
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

##### solution:

In [117]:
ser.apply(lambda x: x.capitalize())

0     How
1      To
2    Kick
3    Ass?
dtype: object

### 19. How to calculate the number of characters in each word in a series?

##### Input:

##### solution:

In [122]:
ser.apply(lambda x: len(x))

0    3
1    2
2    4
3    4
dtype: int64

### 20. How to compute difference of differences between consequtive numbers of a series?

##### Input:

In [123]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

##### solution:

In [128]:
# dif=[]
# for i in range(1,len(ser)):
#     dif.append(ser[i]-ser[i-1])
# dif

# So basically what diff() does is give the difference between the consecutive numbers
ser.diff().diff()

0    NaN
1    NaN
2    1.0
3    1.0
4    1.0
5    1.0
6    0.0
7    2.0
dtype: float64

### 21. How to convert a series of date-strings to a timeseries?

##### Input:

In [130]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

##### solution:

In [142]:
ser.apply(lambda x: pd.to_datetime(x))
# pd.to_datetime(ser)

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]

### 22. How to get the day of month, week number, day of year and day of week from a series of date strings?

##### Input:

In [143]:
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

##### solution:

In [160]:
dates=ser.apply(lambda x: pd.to_datetime(x))
day_of_month=dates.dt.da
week_number=dates.dt.isocalendar().week
day_of_year=dates.dt.day_of_year
day_of_week=dates.dt.day_of_week

0      1
1     33
2     63
3     94
4    125
5    157
dtype: int32

### 23. How to convert year-month string to dates corresponding to the 4th day of the month?

##### Input:

In [161]:
ser = pd.Series(['Jan 2010', 'Feb 2011', 'Mar 2012'])

##### solution:

In [164]:
ser.apply(lambda x: pd.to_datetime('04'+x))

0   2010-01-04
1   2011-02-04
2   2012-03-04
dtype: datetime64[ns]

### 24. How to filter words that contain atleast 2 vowels from a series?

##### Input:

In [165]:
ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])

##### solution:

In [175]:
def count_vowel(x):
    vowels=0
    for y in x:
        if(y in 'aeiou' or y in 'AEIOU'):
            vowels=vowels+1
    return vowels
# count_vowel('Apple')
ser.apply(lambda x: x if(count_vowel(x))>=2 else None).dropna()

0     Apple
1    Orange
4     Money
dtype: object

### 25. How to filter valid emails from a series?

##### Input:

In [177]:
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'

##### solution:

In [179]:
import re

In [189]:
mask=emails.apply(lambda x: bool(re.search(pattern,x)))
emails[mask]

1    rameses@egypt.com
2            matt@t.co
3    narendra@modi.com
dtype: object

### 26. How to get the mean of a series grouped by another series?

##### Input:

In [190]:
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))
print(weights.tolist())
print(fruit.tolist())

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
['carrot', 'carrot', 'banana', 'carrot', 'apple', 'banana', 'banana', 'apple', 'carrot', 'apple']


##### solution:

In [194]:
weights.groupby(fruit).mean()

apple     7.666667
banana    5.333333
carrot    4.000000
dtype: float64

### 27. How to compute the euclidean distance between two series?

##### Input:

In [195]:
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])

##### solution:

In [197]:
np.sqrt(np.square(p-q))

0    9.0
1    7.0
2    5.0
3    3.0
4    1.0
5    1.0
6    3.0
7    5.0
8    7.0
9    9.0
dtype: float64

### 28. How to find all the local maxima (or peaks) in a numeric series?

##### Input:

##### solution:

### 29. How to replace missing spaces in a string with the least frequent character?

##### Input:

##### solution:

### 30. How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values?

##### solution:

### 31. How to fill an intermittent time series so all missing dates show up with values of previous non-missing date?

##### Input:

##### solution:

### 32. How to compute the autocorrelations of a numeric series?

##### Input:

##### solution:

### 33. How to import only every nth row from a csv file to create a dataframe?

##### Input:

##### solution:

### 34. How to change column values when importing csv to a dataframe?

##### Input:

##### solution:

### 35. How to create a dataframe with rows as strides from a given series?

##### Input:

##### solution:

### 36. How to import only specified columns from a csv file?

##### Input:

##### solution:

### 37. How to get the nrows, ncolumns, datatype, summary stats of each column of a dataframe? Also get the array and list equivalent.

##### Input:

##### solution:

### 38. How to extract the row and column number of a particular cell with given criterion?

##### Input:

##### solution:

### 39. How to rename a specific columns in a dataframe?

##### Input:

##### solution:

### 40. How to check if a dataframe has any missing values?

##### Input:

##### solution:

### 41. How to count the number of missing values in each column?

##### Input:

##### solution:

### 42. How to replace missing values of multiple numeric columns with the mean?

##### Input:

##### solution:

### 43. How to use apply function on existing columns with global variables as additional arguments?

##### Input:

##### solution:

### 44. How to select a specific column from a dataframe as a dataframe instead of a series?

##### Input:

##### solution:

### 45. How to change the order of columns of a dataframe?

##### Input:

##### solution:

### 46. How to set the number of rows and columns displayed in the output?

##### Input:

##### solution:

### 47. How to format or suppress scientific notations in a pandas dataframe?

##### Input:

##### solution:

### 48. How to format all the values in a dataframe as percentages?

##### Input:

##### solution:

### 49. How to filter every nth row in a dataframe?

##### Input:

##### solution:

### 50. How to create a primary key index by combining relevant columns?

##### Input:

##### solution:

### 51. How to get the row number of the nth largest value in a column?

##### Input:

##### solution:

### 52. How to find the position of the nth largest value greater than a given value?

##### Input:

##### solution:

### 53. How to get the last n rows of a dataframe with row sum > 100?

##### Input:

##### solution:

### 54. How to find and cap outliers from a series or dataframe column?

##### Input:

##### solution:

### 55. How to reshape a dataframe to the largest possible square after removing the negative values?

##### Input:

##### solution:

### 56. How to swap two rows of a dataframe?

##### Input:

##### solution:

### 57. How to reverse the rows of a dataframe?

##### Input:

##### solution:

### 58. How to create one-hot encodings of a categorical variable (dummy variables)?

##### Input:

##### solution:

### 59. Which column contains the highest number of row-wise maximum values?

##### Input:

##### solution:

### 60. How to create a new column that contains the row number of nearest column by euclidean distance?

##### Input:

##### solution:

### 61. How to know the maximum possible correlation value of each column against other columns?

##### Input:

##### solution:

### 62. How to create a column containing the minimum by maximum of each row?

##### Input:

##### solution:

### 63. How to create a column that contains the penultimate value in each row?

##### Input:

##### solution:

### 64. How to normalize all columns in a dataframe?

##### Input:

##### solution:

### 65. How to compute the correlation of each row with the suceeding row?

##### Input:

##### solution:

### 66. How to replace both the diagonals of dataframe with 0?

##### Input:

##### solution:

### 67. How to get the particular group of a groupby dataframe by key?

##### Input:

##### solution:

### 68. How to get the n’th largest value of a column when grouped by another column?

##### Input:

##### solution:

### 69. How to compute grouped mean on pandas dataframe and keep the grouped column as another column (not index)?

##### Input:

##### solution:

### 70. How to join two dataframes by 2 columns so they have only the common rows?

##### Input:

##### solution:

### 71. How to remove rows from a dataframe that are present in another dataframe?

##### Input:

##### solution:

### 72. How to get the positions where values of two columns match?

##### Input:

##### solution:

### 73. How to create lags and leads of a column in a dataframe?

##### Input:

##### solution:

### 74. How to get the frequency of unique values in the entire dataframe?

##### Input:

##### solution:

### 75. How to split a text column into two separate columns?

##### Input:

##### solution: