### 23. How to convert year-month string to dates corresponding to the 4th day of the month?
- Change ser to dates that start with 4th of the respective months.

In [16]:
# Input
import pandas as pd
import numpy as np

ser = pd.Series(['Jan 2010', 'Feb 2011', 'Mar 2012'])

In [17]:
# Solution 1
from dateutil.parser import parse

# Parse the date
ser_ts = ser.map(lambda x: parse(x))

# Consturct date string sith date as 4
ser_datestr = ser_ts.dt.year.astype('str') + '-' + ser_ts.dt.month.astype('str') + '-' + '04'

# Format it
[parse(i).strftime('%Y-%m-%d') for i in ser_datestr]

# Solution 2
ser.map(lambda x: parse('04' + x))

0   2010-01-04
1   2011-02-04
2   2012-03-04
dtype: datetime64[ns]

### 24. How to filter words that contain atleast 2 vowels from a series?
- From ser, extract words that contain atleast 2 vowels.
- get(a, b) : 첫번째 인자에 해당 찾고 싶은 딕셔너리 key 값 입력하고, 두번째 인자에는 첫번째 인자가 없을 경우 출력하고 싶은 값 입력. key값의 value 값 출력

In [18]:
#Input

ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])

In [19]:
# Solution

from collections import Counter
mask = ser.map(lambda x: sum([Counter(x.lower()).get(i, 0) for i in list('aeiou')]) >= 2)
ser[mask]

0     Apple
1    Orange
4     Money
dtype: object

### 25. How to filter valid emails from a series?
#### Extract the valid emails from the series emails. The regex pattern for valid emails is provided as reference.
- [a-zA-Z] : 알파벳 모두
- [0-9] : 숫자
- Dot(.) : 정규 표현식의 Dot(.) 메타 문자는 줄바꿈 문자인 \n을 제외한 모든 문자와 매치됨을 의미한다.
- + : 반복


In [20]:
# Input
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])

In [21]:
# Solution 1 (as series of strings)
import re
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'
mask = emails.map(lambda x: bool(re.match(pattern, x)))
emails[mask]

# Solution 2 (as series of list)
emails.str.findall(pattern, flags=re.IGNORECASE)

# Solution 3 (as list)
[x[0] for x in [re.findall(pattern, email) for email in emails] if len(x) > 0]

['rameses@egypt.com', 'matt@t.co', 'narendra@modi.com']

### 26. How to get the mean of a series grouped by another series?
- Compute the mean of weights of each fruit.

In [22]:
# Input
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))
print(weights.tolist())
print(fruit.tolist())

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
['apple', 'apple', 'apple', 'apple', 'carrot', 'banana', 'banana', 'carrot', 'apple', 'banana']


In [23]:
# Solution
weights.groupby(fruit).mean()

apple     3.800000
banana    7.666667
carrot    6.500000
dtype: float64

### 27. How to compute the euclidean distance between two series?
- Compute the euclidean distance between series (points) p and q, without using a packaged formula.

In [24]:
# Input
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])

In [25]:
# Solution
sum((p - q)**2)**.5

18.16590212458495

### 28. How to find all the local maxima (or peaks) in a numeric series?
#### Get the positions of peaks (values surrounded by smaller values on both sides) in ser.
- np.sign() :양수이면 1 음수면 -1, 0이면 0으로 만들어준다.
- 배열 원소 간 n차 차분 구하기 : np.diff()
- b = [3,4,5,6,7]
- np.where(6 == c)
- ->(array([3]),)
- np.where(6 == c)[0]
- ->array([3])
- np.where(6 == c)[0][0]
- ->3

In [26]:
# Input
ser = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3])

In [27]:
# Solution
dd = np.diff(np.sign(np.diff(ser)))
peak_locs = np.where(dd == -2)[0] + 1
peak_locs

array([1, 5, 7], dtype=int64)