# <span style="color:#130654; font-family: Helvetica; font-size: 200%; font-weight:700"> Pandas | <span style="font-size: 50%; font-weight:300">String Functions</span>

To use pandas in python import it first by using the following command:

In [1]:
# import pandas
import pandas as pd

Pandas provides a set of string functions which make it easy to operate on string data. Most importantly, these functions ignore (or exclude) missing/NaN values.

<span style="color:green">Note: Always use ".str" attribute while using any of these functions.</span>

| Function                | Description                                                  |
| ----------------------- | ------------------------------------------------------------ |
| **lower**()             | Converts strings in the Series/Index to lower case.          |
| **upper**()             | Converts strings in the Series/Index to upper case.          |
| **len**()               | Computes String length().                                    |
| **strip**()             | Helps strip whitespace(including newline) from each string in the Series/index from both the sides. |
| **split**(' ')          | Splits each string with the given pattern.                   |
| **cat(sep=' ')**        | Concatenates the series/index elements with given separator. |
| **get_dummies**()       | Returns the DataFrame with One-Hot Encoded values.           |
| **contains(pattern)**   | Returns a Boolean value True for each element if the substring contains in the element, else False. |
| **replace(a,b)**        | Replaces the value *a* with the value *b*.                   |
| **repeat(value)**       | Repeats each element with specified number of times.         |
| **count(pattern)**      | Returns count of appearance of pattern in each element.      |
| **startswith(pattern)** | Returns true if the element in the Series/Index starts with the pattern. |
| **endswith(pattern)**   | Returns true if the element in the Series/Index ends with the pattern. |
| **find(pattern)**       | Returns the first position of the first occurrence of the pattern. |
| **findall(pattern)**    | Returns a list of all occurrence of the pattern.             |
| **swapcase**            | Swaps the case lower/upper.                                  |
| **islower()**           | Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean |
| **isupper()**           | Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean. |
| **isnumeric()**         | Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean. |

### <span style="color:#130654">Create DataFrame</span>

Creating a dataset using dictionary:

In [2]:
data = {'emp_name': ['NEHA KULKARNI   ', ' RAJ PANNAL', 'JAY SINHA '],
        'dept_name': ['ops', 'product', 'sales'],
        'salary': [394800, 400000, 600000]}

In [3]:
data

{'emp_name': ['NEHA KULKARNI   ', ' RAJ PANNAL', 'JAY SINHA '],
 'dept_name': ['ops', 'product', 'sales'],
 'salary': [394800, 400000, 600000]}

Lets create a dataframe with automatic index assigned by pandas.

In [5]:
df = pd.DataFrame(data)
print(df)

           emp_name dept_name  salary
0  NEHA KULKARNI          ops  394800
1        RAJ PANNAL   product  400000
2        JAY SINHA      sales  600000


<br>

### <span style="color:#130654">String Functions</span>

`strip()` - remove the white space from employe name "NEHA KULKARNI".

In [6]:
df['emp_name'] = df['emp_name'].str.strip()
df

Unnamed: 0,emp_name,dept_name,salary
0,NEHA KULKARNI,ops,394800
1,RAJ PANNAL,product,400000
2,JAY SINHA,sales,600000


`lower()` - convert upper case employee names to lower case

In [7]:
df['emp_name'] = df['emp_name'].str.lower()
df

Unnamed: 0,emp_name,dept_name,salary
0,neha kulkarni,ops,394800
1,raj pannal,product,400000
2,jay sinha,sales,600000


`upper()` - convert lower case department names to upper case

In [8]:
df['dept_name'] = df['dept_name'].str.upper()
df

Unnamed: 0,emp_name,dept_name,salary
0,neha kulkarni,OPS,394800
1,raj pannal,PRODUCT,400000
2,jay sinha,SALES,600000


`len()` - find the length of employee names' string

In [9]:
df['empn_len'] = df['emp_name'].str.len()
df

Unnamed: 0,emp_name,dept_name,salary,empn_len
0,neha kulkarni,OPS,394800,13
1,raj pannal,PRODUCT,400000,10
2,jay sinha,SALES,600000,9


`split()` - split employee name into his/her first and last name by splitting at " "<br>
Expand = True, will create columns of splitted text.

In [10]:
df[['first_name', 'last_name']] = df['emp_name'].str.split(" ", expand=True)
df

Unnamed: 0,emp_name,dept_name,salary,empn_len,first_name,last_name
0,neha kulkarni,OPS,394800,13,neha,kulkarni
1,raj pannal,PRODUCT,400000,10,raj,pannal
2,jay sinha,SALES,600000,9,jay,sinha


Take first 3 chars from first name and last name columns and concatenate them together to create user name

In [11]:
df['user_name'] = df['first_name'].astype(str).str[:3] + "&" + df['last_name'].astype(str).str[:3] + "@email"
df

Unnamed: 0,emp_name,dept_name,salary,empn_len,first_name,last_name,user_name
0,neha kulkarni,OPS,394800,13,neha,kulkarni,neh&kul@email
1,raj pannal,PRODUCT,400000,10,raj,pannal,raj&pan@email
2,jay sinha,SALES,600000,9,jay,sinha,jay&sin@email


`replace()` - replace "&" in column username with "_"

In [12]:
df['user_name'] = df['user_name'].str.replace("&", "_")
df

Unnamed: 0,emp_name,dept_name,salary,empn_len,first_name,last_name,user_name
0,neha kulkarni,OPS,394800,13,neha,kulkarni,neh_kul@email
1,raj pannal,PRODUCT,400000,10,raj,pannal,raj_pan@email
2,jay sinha,SALES,600000,9,jay,sinha,jay_sin@email


`repeat()` - repeat string, if .str is not used then it will repeat vertically

In [13]:
ch1 = df['first_name'].str[0:2].str.repeat(2)      # takes first 2 characters of first name and repeat twice
ch2 = df['empn_len'].astype('str').str.repeat(2)   # convert empn_len to string and repeat twice
ch3 = "@"
ch4 = df['last_name'].str[-1:].str.repeat(2)       # takes last 2 characters of last name and repeat twice

df['email_pass'] =  ch1 + ch2 + ch3 + ch4

df

Unnamed: 0,emp_name,dept_name,salary,empn_len,first_name,last_name,user_name,email_pass
0,neha kulkarni,OPS,394800,13,neha,kulkarni,neh_kul@email,nene1313@ii
1,raj pannal,PRODUCT,400000,10,raj,pannal,raj_pan@email,rara1010@ll
2,jay sinha,SALES,600000,9,jay,sinha,jay_sin@email,jaja99@aa


<br>

`get_dummies()` - one hot encoded values

In [14]:
df['emp_name'].str.get_dummies()

Unnamed: 0,jay sinha,neha kulkarni,raj pannal
0,0,1,0
1,0,0,1
2,1,0,0


`cat()` - concatenate whole employee name column into one continous string joined by "_" after each name

In [15]:
df['emp_name'].str.cat(sep="_")

'neha kulkarni_raj pannal_jay sinha'

`count()` - count "a" in employee names

In [16]:
df['emp_name'].str.count("a")

0    2
1    3
2    2
Name: emp_name, dtype: int64

`startswith()` - names which start with 'r'

In [17]:
df['emp_name'].str.startswith("r")

0    False
1     True
2    False
Name: emp_name, dtype: bool

`endswith()` - names which ends with 'r'

In [18]:
df['emp_name'].str.endswith("i")

0     True
1    False
2    False
Name: emp_name, dtype: bool

`find()` - first occurence of alphabet 'a' in the employee name. <br>
Starts with 0.

In [19]:
df['emp_name'].str.find("a")

0    3
1    1
2    1
Name: emp_name, dtype: int64

`findall()` - all occurence of "a" in names as list.

In [20]:
df['emp_name'].str.findall("a")

0       [a, a]
1    [a, a, a]
2       [a, a]
Name: emp_name, dtype: object

`islower()` - check if employee name is in lower case.

In [21]:
df['emp_name'].str.islower()

0    True
1    True
2    True
Name: emp_name, dtype: bool

`isupper()` - check if employee name is in upper case.

In [22]:
df['emp_name'].str.isupper()

0    False
1    False
2    False
Name: emp_name, dtype: bool

`isnumeric()` - check if salary column is has numeric values.

In [23]:
df['salary'].astype('str').str.isnumeric()

0    True
1    True
2    True
Name: salary, dtype: bool