___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:120%; text-align:center; border-radius:10px 10px;">Way to Reinvent Yourself</p>

<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">Working with Text & Time Data</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [WORKING WITH TEXT DATA](#0)
* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#00)
* [WORKING WITH TIME DATA](#1)
    * [String Methods](#1.1)
    * [Most Usefull String Methods](#1.2)
    * [Dummy Operations](#1.3)
* [WORKING WITH TIME DATA](#2)
    * [pd.to_datetime()](#2.1)
    * [Series.dt()](#2.2)
    * [Datetime Module](#2.3)
    * [Series.dt()](#2.4)
* [OPERATION WITH DATETIME OBJECT](#3)
* [THE END OF THE SESSION](#4)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="00"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [3]:
import numpy as np
import pandas as pd
import seaborn as sns

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Working with Text Data</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In this notebook, we will first discuss the string operations with our basic Series/Index and learn how to apply these string functions on the DataFrame.

Pandas provides a set of string functions which make it easy to operate on string data. Most importantly, these functions ignore (or exclude) missing/NaN values. Almost, all of these methods work with Python string functions [Refer To Official Python Documentation]( https://docs.python.org/3/library/stdtypes.html#string-methods). So, while studying with the Series Object, convert it to String Object and then perform the operation.

In addition, according to [Pandas Official Document](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html), there are two ways to store text data in pandas:
- object -dtype NumPy array.
- StringDtype extension type.

Pandas recommend using StringDtype to store text data.

[SOURCE01](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html), [SOURCE02](https://www.w3schools.com/python/python_ref_string.asp)

# PearDeck

Pandalarda metin verilerini depolamanın iki yolu vardır:
1. object -dtype NumPy array.
2. StringDType uzantı türü.

Text data depolamak için StringDType kullanmanızı öneririz.

Pandas 1.0'dan object dtype türü tek seçenekti. Bu birçok nedenden dolayı talihsiz bir durumdu:
1. Yanlışlıkla bir object dtype arrayinde string ve string olmayan arrayların karışımını saklayabilirsiniz. Özel bir tipe sahip olmak daha iyidir.
2. Object dtype, DataFrame.select_dtypes() gibi dtype'a özgü işlemleri yavaşlatır. Metin olmayan ancak yine de object dtype sütunları hariç tutarken yalnızca metni seçmenin açık bir yolu yoktur.
3. Kod okunurken, bir object dtype arrayinin içeriği 'string'den daha az açıktır.

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">String Methods</p>

<a id="1.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Strings implement all of the common sequence operations, along with the additional methods described at [the official documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

Strings also support two styles of string formatting, one providing a large degree of flexibility and customization (**Please see the information about** [str.format()](https://docs.python.org/3/library/stdtypes.html#str.format), [Format String Syntax](https://docs.python.org/3/library/string.html#formatstrings) and [Custom String Formatting](https://docs.python.org/3/library/string.html#string-formatting)) and the other based on C printf style formatting that handles a narrower range of types and is slightly harder to use correctly, but is often faster for the cases it can handle ([printf-style String Formatting](https://docs.python.org/3/library/stdtypes.html#old-string-formatting)).

The [Text Processing Services](https://docs.python.org/3/library/text.html#textservices) section of the standard library covers a number of other modules that provide various text related utilities (including regular expression support in the [re](https://docs.python.org/3/library/re.html#module-re) module).

Please watch [**``Video Source``**](https://www.youtube.com/watch?v=6JNwK6hEneg) for enhancing your understanding of working with Text Data in Pandas.  

**What are these String Methods? Now let us examine some of the most common and usefull String Methods and dig into them one by one:**

# PearDeck
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [4]:
df0 = pd.read_excel("text_exercise.xlsx")
df = df0.copy()
df

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
2,E0001,Micheal Brown,IT,data scientist,"""$150,000""",35
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32
6,E0005,Adrian STAR,IT,data scientist,"""$135,000""",40
7,E0006,Albert simon,IT,data scientist,125000dolar,35


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          8 non-null      object
 1   staff       8 non-null      object
 2   department  8 non-null      object
 3   job         8 non-null      object
 4   salary      8 non-null      object
 5   age         8 non-null      object
dtypes: object(6)
memory usage: 512.0+ bytes


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Most Usefull String Methods</p>

<a id="1.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

# PearDeck
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

- **str.lower() =>** Converts a string into lower case
- **str.upper() =>** Converts a string into upper case
- **str.capitalize() =>** Converts the first character to upper case
- **str.title() =>** Converts the first character of each word to upper case
- **str.swapcase() =>** Swaps the case lower/upper

[SOURCE01](https://www.tutorialspoint.com/python_pandas/python_pandas_working_with_text_data.htm)
[SOURCE02](https://www.aboutdatablog.com/post/10-most-useful-string-functions-in-pandas)
[SOURCE03](https://towardsdatascience.com/5-must-know-pandas-operations-on-strings-4f88ca6b8e25)
[SOURCE04](https://towardsdatascience.com/pandas-string-operations-explained-fdfab7602fb4)
[SOURCE05](https://blog.devgenius.io/string-operations-on-pandas-dataframe-88af220439d1)
[SOURCE06](https://www.geeksforgeeks.org/string-manipulations-in-pandas-dataframe/)

___

In [6]:
df.staff.str.lower()

0         tom blue
1       john black
2    micheal brown
3     jason walker
4       alex green
5     oscar smi̇th
6      adrian star
7     albert simon
Name: staff, dtype: object

In [7]:
df.staff.str.upper()

0         TOM BLUE
1       JOHN BLACK
2    MICHEAL BROWN
3     JASON WALKER
4       ALEX GREEN
5      OSCAR SMİTH
6      ADRIAN STAR
7     ALBERT SIMON
Name: staff, dtype: object

In [8]:
df.staff.str.title()

0         Tom Blue
1       John Black
2    Micheal Brown
3     Jason Walker
4       Alex Green
5     Oscar Smi̇th
6      Adrian Star
7     Albert Simon
Name: staff, dtype: object

In [9]:
df.staff.str.capitalize()

0         Tom blue
1       John black
2    Micheal brown
3     Jason walker
4       Alex green
5     Oscar smi̇th
6      Adrian star
7     Albert simon
Name: staff, dtype: object

In [10]:
df.staff.str.swapcase()

0         tOM blue
1       john black
2    mICHEAL bROWN
3     JASON WALKER
4       aLEX gREEN
5     oscar smi̇th
6      aDRIAN star
7     aLBERT SIMON
Name: staff, dtype: object

___

In [11]:
arr = np.array(["ali", "veli", "20"])
dframe = pd.DataFrame(arr, columns = ["isim"])

In [12]:
dframe.isim.str.swapcase()

0     ALI
1    VELI
2      20
Name: isim, dtype: object

- **str.isalpha()     =>** Returns True if all characters in the string are in the alphabet
- **str.isnumeric()   =>** Returns True if all characters in the string are numeric
- **str.isalnum()     =>** Returns True if all characters in the string are alphanumeric
- **str.endswith()	  =>** Returns true if the string ends with the specified value
- **str.startswith()  =>** Returns true if the string starts with the specified value
- **str.contains()	  =>** Returns a Boolean value True for each element if the substring contains in the element, else False.

[SOURCE01](https://careerkarma.com/blog/python-isalpha-isnumeric-isalnum/)
[SOURCE02](https://careerkarma.com/blog/python-startswith-and-endswith/)
[SOURCE03](https://www.geeksforgeeks.org/python-startswith-endswidth-function/)
[SOURCE04](https://towardsdatascience.com/check-for-a-substring-in-a-pandas-dataframe-column-4b949f64852#:~:text=The%20contains%20method%20in%20Pandas,str.)

In [13]:
df

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
2,E0001,Micheal Brown,IT,data scientist,"""$150,000""",35
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32
6,E0005,Adrian STAR,IT,data scientist,"""$135,000""",40
7,E0006,Albert simon,IT,data scientist,125000dolar,35


___

**isalpha()** Function in pandas python checks whether the string consists of alphabetic characters only. It returns True when alphabetic value is present and it returns False when the alphabetic value is not present.

In [14]:
df.job.str.isalpha()
# boşluk alfabetik karakter sayılmıyor

0     True
1     True
2    False
3     True
4    False
5    False
6    False
7    False
Name: job, dtype: bool

**isnumeric()** checks whether all characters in each string are numeric. This is equivalent to running the Python string method str. isnumeric() for each element of the Series/Index.

In [15]:
"10".isnumeric()

True

In [16]:
"10a".isnumeric()

False

In [17]:
df.age.str.isnumeric()
# dtype object oldugu için.

0      NaN
1      NaN
2      NaN
3      NaN
4    False
5      NaN
6      NaN
7      NaN
Name: age, dtype: object

In [18]:
df.age.astype("string").str.isnumeric()

0     True
1     True
2     True
3     True
4    False
5     True
6     True
7     True
Name: age, dtype: boolean

**isalnum()** Function in python checks whether the string consists of alphanumeric characters. It returns True when alphanumeric value is present and it returns False when the alphanumeric value is not present. Alphanumeric means a character that is either a letter or a number.

In [19]:
df.salary.str.isalnum()

0    False
1    False
2    False
3     True
4    False
5    False
6    False
7     True
Name: salary, dtype: bool

Pandas **startswith()** tests if the start of each string element matches a pattern. It is yet another method to search and filter text data in Series or Data Frame. This method is Similar to Python’s startswith() method, but has different parameters and it works on Pandas objects only. Hence .str has to be prefixed everytime before calling this method, so that the compiler knows that it’s different from default function.

In [20]:
df.job.str.startswith("d")

0    False
1    False
2     True
3    False
4    False
5    False
6     True
7     True
Name: job, dtype: bool

In [21]:
df.job.str.startswith("da")

0    False
1    False
2     True
3    False
4    False
5    False
6     True
7     True
Name: job, dtype: bool

Pandas **endswith()** method is a built-in function that determines whether the given string ends with a specific sequence of characters.

In [22]:
df.job.str.endswith("r")

0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
Name: job, dtype: bool

In [23]:
df.job.str.endswith("er")

0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
Name: job, dtype: bool

In [24]:
df[df.job.str.endswith("er")]

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32


In [25]:
df[["job"]][df.job.str.endswith("er")]

Unnamed: 0,job
0,manager
1,manager
3,recruiter
4,backend developer
5,frontend developer


The **contains()** method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not [SOURCE](https://towardsdatascience.com/check-for-a-substring-in-a-pandas-dataframe-column-4b949f64852#:~:text=The%20contains%20method%20in%20Pandas,str.).

In [26]:
df.job.str.contains("per")
# içinde geçiyor mu diye bakıyoruz

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: job, dtype: bool

In [27]:
df[df.job.str.contains("per")]

Unnamed: 0,id,staff,department,job,salary,age
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32


In [28]:
df.salary.str.contains("[a-z]+")
# a-z lower aralıgında bir veya birden fazla karater var mı dıye kontrol ettık

0    False
1    False
2    False
3     True
4    False
5    False
6    False
7     True
Name: salary, dtype: bool

In [29]:
df[df.salary.str.contains("[a-z]+")]

Unnamed: 0,id,staff,department,job,salary,age
3,E0002,jason walker,HR,recruiter,130000dolar,38
7,E0006,Albert simon,IT,data scientist,125000dolar,35


In [30]:
df.job.str.contains("ata")

0    False
1    False
2     True
3    False
4    False
5    False
6     True
7     True
Name: job, dtype: bool

In [31]:
df[df.job.str.contains("ata")]

Unnamed: 0,id,staff,department,job,salary,age
2,E0001,Micheal Brown,IT,data scientist,"""$150,000""",35
6,E0005,Adrian STAR,IT,data scientist,"""$135,000""",40
7,E0006,Albert simon,IT,data scientist,125000dolar,35


In [32]:
df.loc[df.job.str.contains("data"), "department"] = "DS"

In [33]:
df

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
2,E0001,Micheal Brown,DS,data scientist,"""$150,000""",35
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32
6,E0005,Adrian STAR,DS,data scientist,"""$135,000""",40
7,E0006,Albert simon,DS,data scientist,125000dolar,35


In [34]:
df.loc[df.job.str.contains("data"), "department"] = "IT"

In [35]:
df

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
2,E0001,Micheal Brown,IT,data scientist,"""$150,000""",35
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32
6,E0005,Adrian STAR,IT,data scientist,"""$135,000""",40
7,E0006,Albert simon,IT,data scientist,125000dolar,35


we can use these string methods which returning boolean expression for creating condition and so selecting relative rows

___

- **str.strip()	=>** Returns a trimmed version of the string

- **str.replace() =>** Returns a string where a specified value is replaced with a specified value

- **str.split()	=>** Splits the string at the specified separator, and returns a list

- **str.find()	=>** Searches the string for a specified value and returns the position of where it was found

- **str.findall()	=>** Returns a list of all occurrence of the pattern.

- **str.join()	=>** Converts the elements of an iterable into a string

In [36]:
df

Unnamed: 0,id,staff,department,job,salary,age
0,M0001,Tom BLUE,HR,manager,"""$150,000""",52
1,M0002,JOHN BLACK,IT,manager,"""$180,000""",48
2,E0001,Micheal Brown,IT,data scientist,"""$150,000""",35
3,E0002,jason walker,HR,recruiter,130000dolar,38
4,E0003,Alex Green,IT,backend developer,"""$110,000""",-
5,E0004,OSCAR SMİTH,IT,frontend developer,"""$120,000""",32
6,E0005,Adrian STAR,IT,data scientist,"""$135,000""",40
7,E0006,Albert simon,IT,data scientist,125000dolar,35


**NOTE:** For a better using and understanding of strip, please revise escape characters in python [Source01 for Escape Characters](https://www.python-ds.com/python-3-escape-sequences) & [Source02 for Escape Characters](https://www.w3schools.com/python/gloss_python_escape_characters.asp)

In [37]:
df.salary.str.strip("\"")
# escape sequence

0       $150,000
1       $180,000
2       $150,000
3    130000dolar
4       $110,000
5       $120,000
6       $135,000
7    125000dolar
Name: salary, dtype: object

In [38]:
df.salary.str.strip('"')

0       $150,000
1       $180,000
2       $150,000
3    130000dolar
4       $110,000
5       $120,000
6       $135,000
7    125000dolar
Name: salary, dtype: object

In [39]:
df.salary.str.strip("\"").str.strip("dolar")
# veya rstrip

0    $150,000
1    $180,000
2    $150,000
3      130000
4    $110,000
5    $120,000
6    $135,000
7      125000
Name: salary, dtype: object

In [40]:
df.salary.str.strip("\"").str.strip("dolar").str.lstrip("$")

0    150,000
1    180,000
2    150,000
3     130000
4    110,000
5    120,000
6    135,000
7     125000
Name: salary, dtype: object

In [41]:
df.salary.str.strip("\"dolar$")

0    150,000
1    180,000
2    150,000
3     130000
4    110,000
5    120,000
6    135,000
7     125000
Name: salary, dtype: object

In [42]:
df.salary.str.strip("\"dolar$").replace(",", "")
# pythonun replace'i belirtilen karakter sadece ondan ibaretse change eder

0    150,000
1    180,000
2    150,000
3     130000
4    110,000
5    120,000
6    135,000
7     125000
Name: salary, dtype: object

In [43]:
df.salary.str.strip("\"dolar$").str.replace(",", "")
# pandasın replace'i elementry inceleme yapar ve change eder.

0    150000
1    180000
2    150000
3    130000
4    110000
5    120000
6    135000
7    125000
Name: salary, dtype: object

In [44]:
#df.salary.str.strip("\"dolar$").str.replace(",", 9)
# çalıştırmaz. object oldugu için int ile degıstıremeyız

In [45]:
df.salary.str.strip("\"dolar$").str.replace(",", "").astype(int)

0    150000
1    180000
2    150000
3    130000
4    110000
5    120000
6    135000
7    125000
Name: salary, dtype: int32

In [46]:
df.salary = df.salary.str.strip("\"dolar$").str.replace(",", "").astype(int)

In [47]:
df.salary

0    150000
1    180000
2    150000
3    130000
4    110000
5    120000
6    135000
7    125000
Name: salary, dtype: int32

In [48]:
df.age

0    52
1    48
2    35
3    38
4     -
5    32
6    40
7    35
Name: age, dtype: object

In [49]:
df.age.replace("-", np.nan)
# tamamı - oldugu için python'un replace'i çalıştırır
# df.age.str.replace("-", np.nan) # gives an error

0    52.0
1    48.0
2    35.0
3    38.0
4     NaN
5    32.0
6    40.0
7    35.0
Name: age, dtype: float64

In [50]:
df.age = df.age.replace("-", np.nan)
df.age

0    52.0
1    48.0
2    35.0
3    38.0
4     NaN
5    32.0
6    40.0
7    35.0
Name: age, dtype: float64

In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   id          8 non-null      object 
 1   staff       8 non-null      object 
 2   department  8 non-null      object 
 3   job         8 non-null      object 
 4   salary      8 non-null      int32  
 5   age         7 non-null      float64
dtypes: float64(1), int32(1), object(4)
memory usage: 480.0+ bytes


___

### ``str.replace()`` vs **``.replace()``

- **Purpose:** Use **str.replace** for substring replacements on a single string column, and **replace** for any general replacement on one or more columns.

- **Usage:** **str.replace** can replace one thing at a time. **replace** lets you perform multiple independent replacements, i.e., replace many things at once.

- **Default behavior:** **str.replace** enables regex replacement by default. **replace** only performs a full match unless the regex=True switch is used.

**Indexing with .str[]** 

You can use [] notation to directly index by position locations [SOURCE](https://pandas.pydata.org/pandas-docs/version/0.15/text.html). 

In [52]:
df.staff

0         Tom BLUE
1       JOHN BLACK
2    Micheal Brown
3     jason walker
4       Alex Green
5      OSCAR SMİTH
6      Adrian STAR
7     Albert simon
Name: staff, dtype: object

In [53]:
df.staff = df.staff.str.title()
df.staff

0         Tom Blue
1       John Black
2    Micheal Brown
3     Jason Walker
4       Alex Green
5     Oscar Smi̇th
6      Adrian Star
7     Albert Simon
Name: staff, dtype: object

In [54]:
df.staff.str.split()

0         [Tom, Blue]
1       [John, Black]
2    [Micheal, Brown]
3     [Jason, Walker]
4       [Alex, Green]
5     [Oscar, Smi̇th]
6      [Adrian, Star]
7     [Albert, Simon]
Name: staff, dtype: object

In [55]:
df.staff.str.split(pat = " ")

0         [Tom, Blue]
1       [John, Black]
2    [Micheal, Brown]
3     [Jason, Walker]
4       [Alex, Green]
5     [Oscar, Smi̇th]
6      [Adrian, Star]
7     [Albert, Simon]
Name: staff, dtype: object

In [56]:
df.staff.str.split()[0]
# ilk indexi aldı

['Tom', 'Blue']

In [57]:
df.staff.str.split().str[0]
# ilk sutunları aldı

0        Tom
1       John
2    Micheal
3      Jason
4       Alex
5      Oscar
6     Adrian
7     Albert
Name: staff, dtype: object

In [58]:
df["Name"] = df.staff.str.split().str[0]
df["Surname"] = df.staff.str.split().str[1]
df

Unnamed: 0,id,staff,department,job,salary,age,Name,Surname
0,M0001,Tom Blue,HR,manager,150000,52.0,Tom,Blue
1,M0002,John Black,IT,manager,180000,48.0,John,Black
2,E0001,Micheal Brown,IT,data scientist,150000,35.0,Micheal,Brown
3,E0002,Jason Walker,HR,recruiter,130000,38.0,Jason,Walker
4,E0003,Alex Green,IT,backend developer,110000,,Alex,Green
5,E0004,Oscar Smi̇th,IT,frontend developer,120000,32.0,Oscar,Smi̇th
6,E0005,Adrian Star,IT,data scientist,135000,40.0,Adrian,Star
7,E0006,Albert Simon,IT,data scientist,125000,35.0,Albert,Simon


In [59]:
df.drop("staff", axis = 1, inplace = True)

In [58]:
df

Unnamed: 0,id,department,job,salary,age,Name,Surname
0,M0001,HR,manager,150000,52.0,Tom,Blue
1,M0002,IT,manager,180000,48.0,John,Black
2,E0001,IT,data scientist,150000,35.0,Micheal,Brown
3,E0002,HR,recruiter,130000,38.0,Jason,Walker
4,E0003,IT,backend developer,110000,,Alex,Green
5,E0004,IT,frontend developer,120000,32.0,Oscar,Smi̇th
6,E0005,IT,data scientist,135000,40.0,Adrian,Star
7,E0006,IT,data scientist,125000,35.0,Albert,Simon


In [60]:
df.job

0               manager
1               manager
2        data scientist
3             recruiter
4     backend developer
5    frontend developer
6        data scientist
7        data scientist
Name: job, dtype: object

**str.find** returns lowest indexes in each strings in the Series/Index. Each of returned indexes corresponds to the position where the substring is fully contained between [start:end]. Return -1 on failure. Equivalent to standard str.find().

**str.rfind** returns highest indexes in each strings in the Series/Index. Each of returned indexes corresponds to the position where the substring is fully contained between [start:end]. Return -1 on failure. Equivalent to standard str.rfind().

In [59]:
df.job.str.find("developer")
# -1 yok, diğerleri "developer"'ın ilk karakterinin kaçıncı indexte basladıgını gosterir

0   -1
1   -1
2   -1
3   -1
4    8
5    9
6   -1
7   -1
Name: job, dtype: int64

In [64]:
df.job.str.find("e")

0    5
1    5
2    8
3    1
4    4
5    5
6    8
7    8
Name: job, dtype: int64

In [65]:
df.job.str.rfind("e")

0     5
1     5
2     8
3     7
4    15
5    16
6     8
7     8
Name: job, dtype: int64

**str.findall** finds all occurrences of pattern or regular expression in the Series/Index [SOURCE](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.findall.html).

In [60]:
df.job.str.findall("developer")
# kaç tane geciyorsa onu getiriyor

0             []
1             []
2             []
3             []
4    [developer]
5    [developer]
6             []
7             []
Name: job, dtype: object

In [61]:
df.job.str.findall("d")

0        []
1        []
2       [d]
3        []
4    [d, d]
5    [d, d]
6       [d]
7       [d]
Name: job, dtype: object

In [62]:
df.job.str.findall("e")

0             [e]
1             [e]
2             [e]
3          [e, e]
4    [e, e, e, e]
5    [e, e, e, e]
6             [e]
7             [e]
Name: job, dtype: object

In [63]:
df.job.str.findall("d").apply(len)
# satırda kaç tane geçtiğini bulduk

0    0
1    0
2    1
3    0
4    2
5    2
6    1
7    1
Name: job, dtype: int64

In [66]:
df["skills"] = [[],["Java","C++"],["Python","Tableau","SQL"],[],["React","Django"],["JavaScript","Python"],["R","SQL"],["SQL","Python"]]
df["Skills"] = [[],[],["Python","Tableau","SQL"],[],["React","Django"],["JavaScript","Python"],["R","SQL"],["SQL","Python"]]
df.loc[1, "Skills"] = "Java,C++"
df

Unnamed: 0,id,department,job,salary,age,Name,Surname,skills,Skills
0,M0001,HR,manager,150000,52.0,Tom,Blue,[],[]
1,M0002,IT,manager,180000,48.0,John,Black,"[Java, C++]","Java,C++"
2,E0001,IT,data scientist,150000,35.0,Micheal,Brown,"[Python, Tableau, SQL]","[Python, Tableau, SQL]"
3,E0002,HR,recruiter,130000,38.0,Jason,Walker,[],[]
4,E0003,IT,backend developer,110000,,Alex,Green,"[React, Django]","[React, Django]"
5,E0004,IT,frontend developer,120000,32.0,Oscar,Smi̇th,"[JavaScript, Python]","[JavaScript, Python]"
6,E0005,IT,data scientist,135000,40.0,Adrian,Star,"[R, SQL]","[R, SQL]"
7,E0006,IT,data scientist,125000,35.0,Albert,Simon,"[SQL, Python]","[SQL, Python]"


If the elements of a Series are lists themselves, join the content of these lists using the delimiter passed to the function. This function is an equivalent to str.join() [SOURCE](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.join.html).

**Join** lists contained as elements in the Series/Index with passed delimiter.

In [69]:
",".join("clarusway")

'c,l,a,r,u,s,w,a,y'

In [65]:
df.skills.str.join(",")
# iki elemanı listeden cıkartıp virgül ile birleştirdi. Listelerden kurtarmıs olduk. String yaptık

0                      
1              Java,C++
2    Python,Tableau,SQL
3                      
4          React,Django
5     JavaScript,Python
6                 R,SQL
7            SQL,Python
Name: skills, dtype: object

In [66]:
df.skills.str.join(",")[2]

'Python,Tableau,SQL'

In [67]:
type(df.skills.str.join(",")[2])

str

In [68]:
df.Skills.str.join(",")
# 1.index liste olmadıgı için. String'in elemanlarını virgül ile  birleştirdi

0                      
1       J,a,v,a,,,C,+,+
2    Python,Tableau,SQL
3                      
4          React,Django
5     JavaScript,Python
6                 R,SQL
7            SQL,Python
Name: Skills, dtype: object

In [71]:
df.Skills.apply(lambda x : ",".join(x) if type(x) == list else x)
# eger liste ise "," ile join et degilse x'i direk al

0                      
1              Java,C++
2    Python,Tableau,SQL
3                      
4          React,Django
5     JavaScript,Python
6                 R,SQL
7            SQL,Python
Name: Skills, dtype: object

In [72]:
[",".join(x) if type(x) == list else x for x in df.Skills]

['',
 'Java,C++',
 'Python,Tableau,SQL',
 '',
 'React,Django',
 'JavaScript,Python',
 'R,SQL',
 'SQL,Python']

In [73]:
df["Skills"] = df.Skills.apply(lambda x : ",".join(x) if type(x) == list else x)
df

Unnamed: 0,id,department,job,salary,age,Name,Surname,skills,Skills
0,M0001,HR,manager,150000,52.0,Tom,Blue,[],
1,M0002,IT,manager,180000,48.0,John,Black,"[Java, C++]","Java,C++"
2,E0001,IT,data scientist,150000,35.0,Micheal,Brown,"[Python, Tableau, SQL]","Python,Tableau,SQL"
3,E0002,HR,recruiter,130000,38.0,Jason,Walker,[],
4,E0003,IT,backend developer,110000,,Alex,Green,"[React, Django]","React,Django"
5,E0004,IT,frontend developer,120000,32.0,Oscar,Smi̇th,"[JavaScript, Python]","JavaScript,Python"
6,E0005,IT,data scientist,135000,40.0,Adrian,Star,"[R, SQL]","R,SQL"
7,E0006,IT,data scientist,125000,35.0,Albert,Simon,"[SQL, Python]","SQL,Python"


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Dummy Operations</p>

<a id="1.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A dataset may contain various type of values, sometimes it consists of categorical values. So, in-order to use those categorical value for programming efficiently we create dummy variables. A dummy variable is a binary variable that indicates whether a separate categorical variable takes on a specific value [SOURCE](https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/).

### get_dummies()

**Syntax1:** ``pd.get_dummies(data, prefix=None, prefix_sep="_",)``<br>
            **OR**<br>
**Syntax2:** ``df["col_name"].get_dummies(sep = ",")``

**Parameters:**
- data= input data i.e. it includes pandas data frame. list . set . numpy arrays etc.
- prefix= Initial value
- prefix_sep= Data values separation.
- Return Type: Dummy variables.

# PearDeck
![image.png](attachment:image.png)

In [74]:
df.department

0    HR
1    IT
2    IT
3    HR
4    IT
5    IT
6    IT
7    IT
Name: department, dtype: object

In [75]:
pd.get_dummies(df.department)

Unnamed: 0,HR,IT
0,1,0
1,0,1
2,0,1
3,1,0
4,0,1
5,0,1
6,0,1
7,0,1


In [71]:
pd.get_dummies(df.department, prefix="Department")

Unnamed: 0,Department_HR,Department_IT
0,1,0
1,0,1
2,0,1
3,1,0
4,0,1
5,0,1
6,0,1
7,0,1


In [72]:
pd.get_dummies(df.department, prefix="Department", prefix_sep="*")

Unnamed: 0,Department*HR,Department*IT
0,1,0
1,0,1
2,0,1
3,1,0
4,0,1
5,0,1
6,0,1
7,0,1


As you can see two(2) dummy variables are created for the three categorical values of the "department" attribute. We can create dummy variables in python using **``get_dummies()``** method.

Dummies with **``drop_first=True``** parameter can be used to drop the first column. drop_first=True is important to use, as it helps in reducing the extra column created during dummy variable creation. Hence it reduces the correlations created among dummy variables. In other words it drops the first dummy to avoid the creation of correlated features [SOURCE](https://stackoverflow.com/questions/63661560/drop-first-true-during-dummy-variable-creation-in-pandas#:~:text=1%20Answer,correlations%20created%20among%20dummy%20variables.).

In [76]:
pd.get_dummies(df.department, drop_first=True)
# ilk sutunu atar

Unnamed: 0,IT
0,0
1,1
2,1
3,0
4,1
5,1
6,1
7,1


In [77]:
df.Skills

0                      
1              Java,C++
2    Python,Tableau,SQL
3                      
4          React,Django
5     JavaScript,Python
6                 R,SQL
7            SQL,Python
Name: Skills, dtype: object

In [78]:
df.Skills.str.get_dummies(sep=",")
# ,'ler ile ayrıldıgı için sep parametresini kullandık

Unnamed: 0,C++,Django,Java,JavaScript,Python,R,React,SQL,Tableau
0,0,0,0,0,0,0,0,0,0
1,1,0,1,0,0,0,0,0,0
2,0,0,0,0,1,0,0,1,1
3,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,1,0,0
5,0,0,0,1,1,0,0,0,0
6,0,0,0,0,0,1,0,1,0
7,0,0,0,0,1,0,0,1,0


In [80]:
df.Skills.str.get_dummies(sep=",").add_prefix("Skills_")

Unnamed: 0,Skills_C++,Skills_Django,Skills_Java,Skills_JavaScript,Skills_Python,Skills_R,Skills_React,Skills_SQL,Skills_Tableau
0,0,0,0,0,0,0,0,0,0
1,1,0,1,0,0,0,0,0,0
2,0,0,0,0,1,0,0,1,1
3,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,1,0,0
5,0,0,0,1,1,0,0,0,0
6,0,0,0,0,0,1,0,1,0
7,0,0,0,0,1,0,0,1,0


In [73]:
skills_dummy = df.Skills.str.get_dummies(sep=",").add_prefix("Skills_")

In [74]:
df

Unnamed: 0,id,department,job,salary,age,Name,Surname,skills,Skills
0,M0001,HR,manager,150000,52.0,Tom,Blue,[],[]
1,M0002,IT,manager,180000,48.0,John,Black,"[Java, C++]","Java,C++"
2,E0001,IT,data scientist,150000,35.0,Micheal,Brown,"[Python, Tableau, SQL]","[Python, Tableau, SQL]"
3,E0002,HR,recruiter,130000,38.0,Jason,Walker,[],[]
4,E0003,IT,backend developer,110000,,Alex,Green,"[React, Django]","[React, Django]"
5,E0004,IT,frontend developer,120000,32.0,Oscar,Smi̇th,"[JavaScript, Python]","[JavaScript, Python]"
6,E0005,IT,data scientist,135000,40.0,Adrian,Star,"[R, SQL]","[R, SQL]"
7,E0006,IT,data scientist,125000,35.0,Albert,Simon,"[SQL, Python]","[SQL, Python]"


In [75]:
df_final = df[["department", "job", "salary", "Skills"]]
df_final

Unnamed: 0,department,job,salary,Skills
0,HR,manager,150000,[]
1,IT,manager,180000,"Java,C++"
2,IT,data scientist,150000,"[Python, Tableau, SQL]"
3,HR,recruiter,130000,[]
4,IT,backend developer,110000,"[React, Django]"
5,IT,frontend developer,120000,"[JavaScript, Python]"
6,IT,data scientist,135000,"[R, SQL]"
7,IT,data scientist,125000,"[SQL, Python]"


In [76]:
df_final.join(skills_dummy)

Unnamed: 0,department,job,salary,Skills,Skills_ 'Django'],Skills_ 'Python'],Skills_ 'SQL'],Skills_ 'Tableau',Skills_C++,Skills_Java,Skills_['JavaScript',Skills_['Python',Skills_['R',Skills_['React',Skills_['SQL',Skills_[]
0,HR,manager,150000,[],0,0,0,0,0,0,0,0,0,0,0,1
1,IT,manager,180000,"Java,C++",0,0,0,0,1,1,0,0,0,0,0,0
2,IT,data scientist,150000,"[Python, Tableau, SQL]",0,0,1,1,0,0,0,1,0,0,0,0
3,HR,recruiter,130000,[],0,0,0,0,0,0,0,0,0,0,0,1
4,IT,backend developer,110000,"[React, Django]",1,0,0,0,0,0,0,0,0,1,0,0
5,IT,frontend developer,120000,"[JavaScript, Python]",0,1,0,0,0,0,1,0,0,0,0,0
6,IT,data scientist,135000,"[R, SQL]",0,0,1,0,0,0,0,0,1,0,0,0
7,IT,data scientist,125000,"[SQL, Python]",0,1,0,0,0,0,0,0,0,0,1,0


In [85]:
df_final = df_final.join(skills_dummy)

In [86]:
df_final.drop("Skills", axis = 1 , inplace=True)
df_final

Unnamed: 0,department,job,salary,Skills_C++,Skills_Django,Skills_Java,Skills_JavaScript,Skills_Python,Skills_R,Skills_React,Skills_SQL,Skills_Tableau
0,HR,manager,150000,0,0,0,0,0,0,0,0,0
1,IT,manager,180000,1,0,1,0,0,0,0,0,0
2,IT,data scientist,150000,0,0,0,0,1,0,0,1,1
3,HR,recruiter,130000,0,0,0,0,0,0,0,0,0
4,IT,backend developer,110000,0,1,0,0,0,0,1,0,0
5,IT,frontend developer,120000,0,0,0,1,1,0,0,0,0
6,IT,data scientist,135000,0,0,0,0,0,1,0,1,0
7,IT,data scientist,125000,0,0,0,0,1,0,0,1,0


In [87]:
pd.get_dummies(df_final)
# kendinde prefix eklendi. dummies yaptıklarımızı sildi

Unnamed: 0,salary,Skills_C++,Skills_Django,Skills_Java,Skills_JavaScript,Skills_Python,Skills_R,Skills_React,Skills_SQL,Skills_Tableau,department_HR,department_IT,job_backend developer,job_data scientist,job_frontend developer,job_manager,job_recruiter
0,150000,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0
1,180000,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0
2,150000,0,0,0,0,1,0,0,1,1,0,1,0,1,0,0,0
3,130000,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1
4,110000,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0
5,120000,0,0,0,1,1,0,0,0,0,0,1,0,0,1,0,0
6,135000,0,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0
7,125000,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0


In [88]:
pd.get_dummies(df_final, drop_first=True)
# kendinde prefix eklendi. dummies yaptıklarımızı sildi

Unnamed: 0,salary,Skills_C++,Skills_Django,Skills_Java,Skills_JavaScript,Skills_Python,Skills_R,Skills_React,Skills_SQL,Skills_Tableau,department_IT,job_data scientist,job_frontend developer,job_manager,job_recruiter
0,150000,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,180000,1,0,1,0,0,0,0,0,0,1,0,0,1,0
2,150000,0,0,0,0,1,0,0,1,1,1,1,0,0,0
3,130000,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,110000,0,1,0,0,0,0,1,0,0,1,0,0,0,0
5,120000,0,0,0,1,1,0,0,0,0,1,0,1,0,0
6,135000,0,0,0,0,0,1,0,1,0,1,1,0,0,0
7,125000,0,0,0,0,1,0,0,1,0,1,1,0,0,0


In [89]:
df_final = pd.get_dummies(df_final, drop_first=True)
df_final

Unnamed: 0,salary,Skills_C++,Skills_Django,Skills_Java,Skills_JavaScript,Skills_Python,Skills_R,Skills_React,Skills_SQL,Skills_Tableau,department_IT,job_data scientist,job_frontend developer,job_manager,job_recruiter
0,150000,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,180000,1,0,1,0,0,0,0,0,0,1,0,0,1,0
2,150000,0,0,0,0,1,0,0,1,1,1,1,0,0,0
3,130000,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,110000,0,1,0,0,0,0,1,0,0,1,0,0,0,0
5,120000,0,0,0,1,1,0,0,0,0,1,0,1,0,0
6,135000,0,0,0,0,0,1,0,1,0,1,1,0,0,0
7,125000,0,0,0,0,1,0,0,1,0,1,1,0,0,0


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Working with Time Data</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

As someone who works with time series data on almost a daily basis, it's clear that the pandas Python package is extremely useful for time series manipulation and analysis. This basic introduction to time series data manipulation with pandas should allow you to get started in your time series analysis. Specific objectives are to show you how to:
- create a date range
- work with timestamp data
- convert string data to a timestamp
- index and slice your time series data in a data frame
- resample your time series for different time period aggregates/summary statistics
- compute a rolling statistic such as a rolling average
- work with missing data
- understand the basics of unix/epoch time
- understand common pitfalls of time series data analysis [SOURCE](https://towardsdatascience.com/basic-time-series-manipulation-with-pandas-4432afee64ea)

In this section, we will introduce how to work with each of these types of date/time data in Pandas. This short section is by no means a complete guide to the time series tools available in Python or Pandas, but instead is intended as a broad overview of how you as a user should approach working with time series [SOURCE](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html).

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">pd.to_datetime()</p>

<a id="2.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

For more and detailed information about to_datetime() metod, please [Visit Official Document](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)

**``pd.to_datetime()``** Converts argument to datetime.

This function converts a **``scalar``**, **``array-like``**, **``Series``** or **``DataFrame/dict-like``** to a pandas datetime object.

**As stated above, many input types are supported, and lead to different output types:**

- **``scalars``** can be int, float, str, datetime object (from stdlib datetime module or numpy). They are converted to Timestamp when possible, otherwise they are converted to datetime.datetime. None/NaN/null scalars are converted to NaT.

- **``array-like``** can contain int, float, str, datetime objects. They are converted to DatetimeIndex when possible, otherwise they are converted to Index with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.

- **``Series``** are converted to Series with datetime64 dtype when possible, otherwise they are converted to Series with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.

- **``DataFrame/dict-like``** are converted to Series with datetime64 dtype. For each row a datetime is created from assembling the various dataframe columns. Column keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.

[Special Note :](https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html)

As many data sets do contain datetime information in one of the columns, pandas input function like [pandas.read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) and [pandas.read_json()](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json) can do the transformation to dates when reading the data using the **``parse_dates parameter``** with a list of the columns to read as Timestamp.

Why are these **``pandas.Timestamp``** objects useful? Let's illustrate the added value with some example cases. In this sense, let us assume that we want to work with the dates in the column datetime as datetime objects instead of plain text:

# PearDeck
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [77]:
df = pd.read_csv("time_exercise.csv")
df.head()

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date
0,401,2021-01-23,1.0,541.487603,2018-12-04
1,416,2020-04-02,1.0,131.181818,2018-12-04
2,717,2019-03-10,1.0,2035.4885,2018-12-04
3,778,2019-12-27,1.0,335.988,2018-12-04
4,826,2020-02-19,1.0,342.292302,2018-12-04


In [78]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 911 entries, 0 to 910
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id_product        911 non-null    int64  
 1   order_date        911 non-null    object 
 2   product_quantity  911 non-null    float64
 3   product_price     911 non-null    float64
 4   entry_date        911 non-null    object 
dtypes: float64(2), int64(1), object(2)
memory usage: 35.7+ KB


In [79]:
df.order_date

0      2021-01-23
1      2020-04-02
2      2019-03-10
3      2019-12-27
4      2020-02-19
          ...    
906    2020-11-24
907    2020-11-24
908    2020-11-22
909    2021-01-26
910    2020-12-06
Name: order_date, Length: 911, dtype: object

Initially, the values in datetime are character strings and do **NOT** provide any datetime operations (e.g. extract the year, day of the week,…). By applying the to_datetime function, pandas interprets the strings and convert these to datetime (i.e. ``datetime64[ns, UTC]``) objects. In pandas we call these datetime objects similar to datetime.datetime from the standard library as [pandas.Timestamp](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html#pandas.Timestamp).

In [80]:
pd.to_datetime(df.order_date)

0     2021-01-23
1     2020-04-02
2     2019-03-10
3     2019-12-27
4     2020-02-19
         ...    
906   2020-11-24
907   2020-11-24
908   2020-11-22
909   2021-01-26
910   2020-12-06
Name: order_date, Length: 911, dtype: datetime64[ns]

In [81]:
df.entry_date = pd.to_datetime(df.entry_date)
df.order_date = pd.to_datetime(df.order_date)

In [82]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 911 entries, 0 to 910
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   id_product        911 non-null    int64         
 1   order_date        911 non-null    datetime64[ns]
 2   product_quantity  911 non-null    float64       
 3   product_price     911 non-null    float64       
 4   entry_date        911 non-null    datetime64[ns]
dtypes: datetime64[ns](2), float64(2), int64(1)
memory usage: 35.7 KB


Now let's apply some aggregate methods for Datatime object at the given dataset:

In [96]:
df.entry_date.min()

Timestamp('2018-12-04 00:00:00')

In [97]:
df.entry_date.max()

Timestamp('2020-11-26 00:00:00')

In [99]:
df.entry_date.max() - df.entry_date.min()

Timedelta('723 days 00:00:00')

In [100]:
a = pd.Series(["15-03-2020", "18-05-2019", "24-07-2018"])
a

0    15-03-2020
1    18-05-2019
2    24-07-2018
dtype: object

In [101]:
a.max()
# datetime olarak max değil

'24-07-2018'

In [102]:
pd.to_datetime(a, format = "%d-%m-%Y")
# ilk okudugu %d ile day ikincisini %m ile ay  %y ile de yıl olarak gör anlamına gelir. - olarak da aralarında - oldugunu belirttik

0   2020-03-15
1   2019-05-18
2   2018-07-24
dtype: datetime64[ns]

In [104]:
k = pd.to_datetime(a, format = "%d-%m-%Y").max()
k

Timestamp('2020-03-15 00:00:00')

In [105]:
s = pd.to_datetime(a, format = "%d-%m-%Y").min()
s

Timestamp('2018-07-24 00:00:00')

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Series.dt()</p>

<a id="2.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Accessor object for datetimelike properties of the Series values [SOURCE](https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html).

For a comprehensive information what the datetimelike properties, please visit [Official Pandas API Reference Document](https://pandas.pydata.org/pandas-docs/version/0.22/api.html#datetimelike-properties)

# PearDeck
![image.png](attachment:image.png)

In [83]:
df.entry_date.dt.year

0      2018
1      2018
2      2018
3      2018
4      2018
       ... 
906    2020
907    2020
908    2020
909    2020
910    2020
Name: entry_date, Length: 911, dtype: int64

In [None]:
df.entry_date.dt.year

In [84]:
df.entry_date.dt.quarter

0      4
1      4
2      4
3      4
4      4
      ..
906    4
907    4
908    4
909    4
910    4
Name: entry_date, Length: 911, dtype: int64

In [85]:
df.entry_date.dt.dayofweek

0      1
1      1
2      1
3      1
4      1
      ..
906    2
907    2
908    4
909    1
910    3
Name: entry_date, Length: 911, dtype: int64

date, year, quarter, month, week, day, weekday, dayofweek, hour, minute, second, microsecond

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Datetime Module</p>

<a id="2.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

The datetime module supplies classes for manipulating dates and times [SOURCE](https://docs.python.org/3/library/datetime.html).

### ``class datetime.datetime``

A combination of a date and a time. Attributes: year, month, day, hour, minute, second, microsecond, and tzinfo.

# PearDeck
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [86]:
from datetime import datetime

In [87]:
datetime.now()

datetime.datetime(2022, 8, 19, 17, 49, 57, 332059)

In [88]:
print(datetime.now())

2022-08-19 17:50:05.961464


In [89]:
datetime.today()

datetime.datetime(2022, 8, 19, 17, 50, 11, 335497)

In [90]:
print(datetime.today())

2022-08-19 17:50:19.744618


In [91]:
current_datetime = datetime.now()

In [92]:
current_datetime

datetime.datetime(2022, 8, 19, 17, 50, 31, 294730)

In [93]:
current_datetime.date()

datetime.date(2022, 8, 19)

In [94]:
current_datetime.year

2022

In [95]:
current_datetime.weekday()
# Return the day of the week as an integer, where Monday is 0 and Sunday is 6.

4

In [96]:
current_datetime.isoweekday() 

# Return the day of the week as an integer, where Monday is 1 and Sunday is 7.

5

### ``class datetime.timedelta``

A duration expressing the difference between two date, time, or datetime instances to microsecond resolution [SOURCE](https://www.geeksforgeeks.org/manipulate-date-and-time-with-the-datetime-module-in-python/).

In [97]:
from datetime import timedelta

In [98]:
two_days_before = current_datetime - timedelta(days = 2)

In [99]:
two_days_before

datetime.datetime(2022, 8, 17, 17, 50, 31, 294730)

In [101]:
current_datetime - timedelta(days=3, minutes=10, hours=4, weeks=2)
# 2 hafta, 3 gün, 4 saat, 10 dakika öncesi

datetime.datetime(2022, 8, 2, 13, 40, 31, 294730)

In [102]:
print(f"{'current_date': <15}", datetime.now())
print(f"{'plus': <15}", timedelta(weeks=2, hours=4, minutes=10))
print(f"{'total': <15}", datetime.now() + timedelta(weeks=2, days=3, hours=4, minutes=10))

current_date    2022-08-19 17:51:57.520144
plus            14 days, 4:10:00
total           2022-09-05 22:01:57.520144


### ``strftime()``

**Converting** from date/datetime/timedelta object **to string type** [SOURCE](https://strftime.org/)

In [103]:
current_datetime

datetime.datetime(2022, 8, 19, 17, 50, 31, 294730)

In [130]:
type(current_datetime)

datetime.datetime

In [131]:
current_datetime.year

2022

In [132]:
type(current_datetime.year)

int

**Watch out the difference.**

In [133]:
current_datetime.strftime("%Y")
# string olarak yıl kısmı

'2022'

In [104]:
print("current_datetime :", current_datetime)
year = current_datetime.strftime("%Y")
print("year:", year)
month = current_datetime.strftime("%m")
print("month:", month)
day = current_datetime.strftime("%d")
print("day:", day)
time = current_datetime.strftime("%H:%M:%S")
print("time:", time)
date_time = current_datetime.strftime("%m/%d/%Y, %H:%M:%S")
print("date and time:", date_time)

current_datetime : 2022-08-19 17:50:31.294730
year: 2022
month: 08
day: 19
time: 17:50:31
date and time: 08/19/2022, 17:50:31


### strptime()

**Converting** from string type **to datetime object**

In [105]:
date_string = "21 June, 2018"
date_string

'21 June, 2018'

In [106]:
datetime.strptime(date_string,"%d %B, %Y")

datetime.datetime(2018, 6, 21, 0, 0)

In [107]:
pd.to_datetime(date_string)

Timestamp('2018-06-21 00:00:00')

In [108]:
pd.to_datetime(date_string)

Timestamp('2018-06-21 00:00:00')

In [109]:
datetime_object = datetime.strptime(date_string, "%d %B, %Y")
datetime_object

datetime.datetime(2018, 6, 21, 0, 0)

In [110]:
type(datetime_object)

datetime.datetime

In [111]:
datetime_object.year

2018

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Operation with Datetime Object</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

## Let's detect the time between first order date and entry date for each product

In [112]:
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date
0,401,2021-01-23,1.0,541.487603,2018-12-04
1,416,2020-04-02,1.0,131.181818,2018-12-04
2,717,2019-03-10,1.0,2035.488500,2018-12-04
3,778,2019-12-27,1.0,335.988000,2018-12-04
4,826,2020-02-19,1.0,342.292302,2018-12-04
...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07
908,1536887,2020-11-22,1.0,0.000000,2020-11-13
909,1536952,2021-01-26,1.0,988.429752,2020-11-24


In [144]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 911 entries, 0 to 910
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   id_product        911 non-null    int64         
 1   order_date        911 non-null    datetime64[ns]
 2   product_quantity  911 non-null    float64       
 3   product_price     911 non-null    float64       
 4   entry_date        911 non-null    datetime64[ns]
dtypes: datetime64[ns](2), float64(2), int64(1)
memory usage: 35.7 KB


**Let us do it by string methods**

In [113]:
df.order_date - df.entry_date

0     781 days
1     485 days
2      96 days
3     388 days
4     442 days
        ...   
906    48 days
907    48 days
908     9 days
909    63 days
910    10 days
Length: 911, dtype: timedelta64[ns]

In [114]:
df["time_delta"] = df.order_date - df.entry_date
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta
0,401,2021-01-23,1.0,541.487603,2018-12-04,781 days
1,416,2020-04-02,1.0,131.181818,2018-12-04,485 days
2,717,2019-03-10,1.0,2035.488500,2018-12-04,96 days
3,778,2019-12-27,1.0,335.988000,2018-12-04,388 days
4,826,2020-02-19,1.0,342.292302,2018-12-04,442 days
...,...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48 days
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48 days
908,1536887,2020-11-22,1.0,0.000000,2020-11-13,9 days
909,1536952,2021-01-26,1.0,988.429752,2020-11-24,63 days


In [115]:
df.time_delta.astype("str").str.split(' ').str[0].astype(int)

0      781
1      485
2       96
3      388
4      442
      ... 
906     48
907     48
908      9
909     63
910     10
Name: time_delta, Length: 911, dtype: int32

In [116]:
df.time_delta = df.time_delta.astype("str").str.split(' ').str[0].astype(int)
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta
0,401,2021-01-23,1.0,541.487603,2018-12-04,781
1,416,2020-04-02,1.0,131.181818,2018-12-04,485
2,717,2019-03-10,1.0,2035.488500,2018-12-04,96
3,778,2019-12-27,1.0,335.988000,2018-12-04,388
4,826,2020-02-19,1.0,342.292302,2018-12-04,442
...,...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48
908,1536887,2020-11-22,1.0,0.000000,2020-11-13,9
909,1536952,2021-01-26,1.0,988.429752,2020-11-24,63


In [117]:
df.groupby("id_product").time_delta.min()

id_product
401        781
416        485
717         96
778        388
826        442
          ... 
1536841     38
1536842     48
1536887      9
1536952     63
1536974     10
Name: time_delta, Length: 498, dtype: int32

In [118]:
df.groupby("id_product").time_delta.transform(min)

0      781
1      485
2       96
3      388
4      442
      ... 
906     48
907     48
908      9
909     63
910     10
Name: time_delta, Length: 911, dtype: int32

In [119]:
df["passing_time_to_firstsale"] = df.groupby("id_product").time_delta.transform(min)
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale
0,401,2021-01-23,1.0,541.487603,2018-12-04,781,781
1,416,2020-04-02,1.0,131.181818,2018-12-04,485,485
2,717,2019-03-10,1.0,2035.488500,2018-12-04,96,96
3,778,2019-12-27,1.0,335.988000,2018-12-04,388,388
4,826,2020-02-19,1.0,342.292302,2018-12-04,442,442
...,...,...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48
908,1536887,2020-11-22,1.0,0.000000,2020-11-13,9,9
909,1536952,2021-01-26,1.0,988.429752,2020-11-24,63,63


In [120]:
df.sample(20)

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale
604,6975,2020-11-08,1.0,307.363637,2018-12-04,705,219
161,3677,2020-06-21,1.0,540.909091,2018-12-04,565,454
816,1530658,2020-11-02,1.0,836.0,2019-07-14,477,113
864,1531000,2020-07-16,1.0,473.636363,2019-09-11,309,309
794,1525140,2019-08-30,4.0,150.025,2018-12-04,269,269
524,6329,2020-06-04,1.0,1768.181819,2020-08-24,-81,-81
69,3110,2019-10-21,1.0,209.988,2018-12-04,321,321
2,717,2019-03-10,1.0,2035.4885,2018-12-04,96,96
316,4011,2020-03-07,1.0,98.9868,2018-12-04,459,361
252,3767,2019-09-01,1.0,482.9885,2018-12-04,271,271


In [121]:
df[(df.time_delta != df.passing_time_to_firstsale)]

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale
15,1031,2018-12-16,1.0,234.990000,2018-12-04,12,-16
25,1410,2020-11-29,1.0,227.988000,2018-12-04,726,404
39,2051,2020-10-21,1.0,329.000000,2018-12-04,687,670
41,2065,2019-06-20,1.0,246.662350,2018-12-04,198,-17
43,2072,2019-07-03,1.0,189.737350,2018-12-04,211,185
...,...,...,...,...,...,...,...
898,1536753,2020-09-04,1.0,450.000000,2020-08-15,20,-38
899,1536753,2020-11-23,1.0,450.000000,2020-08-15,100,-38
900,1536753,2020-11-25,2.0,408.000000,2020-08-15,102,-38
904,1536841,2020-11-20,1.0,988.429752,2020-10-07,44,38


In [122]:
df.groupby("id_product")["id_product"].count().sort_values(ascending=False)

id_product
3674       33
3671       21
3711       15
11364      13
1525105    13
           ..
4791        1
4745        1
4713        1
4699        1
1536974     1
Name: id_product, Length: 498, dtype: int64

In [123]:
df[(df.id_product == 3674)].sort_values("time_delta")

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale
116,3674,2019-06-17,1.0,528.9885,2018-12-04,195,195
117,3674,2020-01-08,1.0,551.988,2018-12-04,400,195
118,3674,2020-01-13,1.0,551.988,2018-12-04,405,195
119,3674,2020-01-23,1.0,551.988,2018-12-04,415,195
120,3674,2020-01-28,1.0,551.988,2018-12-04,420,195
121,3674,2020-02-16,1.0,607.1868,2018-12-04,439,195
122,3674,2020-02-21,1.0,607.1868,2018-12-04,444,195
123,3674,2020-03-06,1.0,607.1868,2018-12-04,458,195
124,3674,2020-03-08,1.0,607.1868,2018-12-04,460,195
125,3674,2020-03-09,1.0,607.1868,2018-12-04,461,195


## Let's detect the time between last order date and today for each product

**This time, let us do it by datetime properties**

In [124]:
df.groupby("id_product").order_date.max()

id_product
401       2021-01-23
416       2020-04-02
717       2019-03-10
778       2019-12-27
826       2020-02-19
             ...    
1536841   2020-11-22
1536842   2020-11-24
1536887   2020-11-22
1536952   2021-01-26
1536974   2020-12-06
Name: order_date, Length: 498, dtype: datetime64[ns]

In [125]:
df.groupby("id_product").order_date.transform(max).dt.date

0      2021-01-23
1      2020-04-02
2      2019-03-10
3      2019-12-27
4      2020-02-19
          ...    
906    2020-11-24
907    2020-11-24
908    2020-11-22
909    2021-01-26
910    2020-12-06
Name: order_date, Length: 911, dtype: object

In [128]:
last_order_date = df.groupby("id_product").order_date.transform(max).dt.date

In [129]:
today = pd.to_datetime("27-02-2021", format='%d-%m-%Y').date()
print(today)

2021-02-27


In [130]:
today - last_order_date

0      35 days
1     331 days
2     720 days
3     428 days
4     374 days
        ...   
906    95 days
907    95 days
908    97 days
909    32 days
910    83 days
Name: order_date, Length: 911, dtype: timedelta64[ns]

In [131]:
df["passing_time_from_lastsale"] = today - last_order_date

In [132]:
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale,passing_time_from_lastsale
0,401,2021-01-23,1.0,541.487603,2018-12-04,781,781,35 days
1,416,2020-04-02,1.0,131.181818,2018-12-04,485,485,331 days
2,717,2019-03-10,1.0,2035.488500,2018-12-04,96,96,720 days
3,778,2019-12-27,1.0,335.988000,2018-12-04,388,388,428 days
4,826,2020-02-19,1.0,342.292302,2018-12-04,442,442,374 days
...,...,...,...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48,95 days
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48,95 days
908,1536887,2020-11-22,1.0,0.000000,2020-11-13,9,9,97 days
909,1536952,2021-01-26,1.0,988.429752,2020-11-24,63,63,32 days


In [133]:
df["passing_time_from_lastsale"] = df["passing_time_from_lastsale"].astype("str").str.split(" ").str[0].astype(int)
df

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale,passing_time_from_lastsale
0,401,2021-01-23,1.0,541.487603,2018-12-04,781,781,35
1,416,2020-04-02,1.0,131.181818,2018-12-04,485,485,331
2,717,2019-03-10,1.0,2035.488500,2018-12-04,96,96,720
3,778,2019-12-27,1.0,335.988000,2018-12-04,388,388,428
4,826,2020-02-19,1.0,342.292302,2018-12-04,442,442,374
...,...,...,...,...,...,...,...,...
906,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48,95
907,1536842,2020-11-24,1.0,1186.776860,2020-10-07,48,48,95
908,1536887,2020-11-22,1.0,0.000000,2020-11-13,9,9,97
909,1536952,2021-01-26,1.0,988.429752,2020-11-24,63,63,32


In [134]:
df[(df.id_product == 3674)].sort_values("time_delta")

Unnamed: 0,id_product,order_date,product_quantity,product_price,entry_date,time_delta,passing_time_to_firstsale,passing_time_from_lastsale
116,3674,2019-06-17,1.0,528.9885,2018-12-04,195,195,82
117,3674,2020-01-08,1.0,551.988,2018-12-04,400,195,82
118,3674,2020-01-13,1.0,551.988,2018-12-04,405,195,82
119,3674,2020-01-23,1.0,551.988,2018-12-04,415,195,82
120,3674,2020-01-28,1.0,551.988,2018-12-04,420,195,82
121,3674,2020-02-16,1.0,607.1868,2018-12-04,439,195,82
122,3674,2020-02-21,1.0,607.1868,2018-12-04,444,195,82
123,3674,2020-03-06,1.0,607.1868,2018-12-04,458,195,82
124,3674,2020-03-08,1.0,607.1868,2018-12-04,460,195,82
125,3674,2020-03-09,1.0,607.1868,2018-12-04,461,195,82


## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:150%; text-align:center; border-radius:10px 10px;">The End of Session - 10</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>