**Feature Split**

Splitting features is a good way to make them useful in terms of machine learning. Most of the time the dataset contains string columns that violates tidy data principles. 


**Advantages:**
*   We enable machine learning algorithms to comprehend them.
*   Make possible to bin and group them.
*   Improve model performance by uncovering potential information.

Split function is a good option, however, there is no one way of splitting features. It depends on the characteristics of the column, how to split it.



**Exapmple**

```
       Name column                          Extracting first names
                                 
0  Luther N. Gonzalez                             Luther
1    Charles M. Young                             Charles
2        Terry Lawson       ------>               Terry
3       Kristen White                             Kristen
4      Thomas Logsdon                             Thomas

```

**Basics**

In [13]:
text = 'The quick brown fox jumps over the lazy dog'
  
# Split the text wherever there's a space.
words = text.split()
print(words)

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


In [14]:
paragraph = 'The quick brown fox jumps over the lazy dog. The quick brown dog jumps over the lazy fox' 

# Split the text wherever there's a full stop.
a,b = paragraph.split('.')

# Display the results.
print(a)
print(b)

The quick brown fox jumps over the lazy dog
 The quick brown dog jumps over the lazy fox


In [15]:
cars= 'Audi and Kia and BMV and Volvo and Opel'

# maxsplit of 1
print(cars.split(' and ',1))
# maxsplit of 2
print(cars.split(' and ',2)) 

['Audi', 'Kia and BMV and Volvo and Opel']
['Audi', 'Kia', 'BMV and Volvo and Opel']


**Spliting on Dataframe**

In [0]:
import pandas as pd
import numpy as np


In [0]:
df=pd.read_csv('/content/all_seasons.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
0,0,Dennis Rodman,CHI,36.0,198.12,99.79024,Southeastern Oklahoma State,USA,1986,2,27,55,5.7,16.1,3.1,16.1,0.186,0.323,0.1,0.479,0.113,1996-97
1,1,Dwayne Schintzius,LAC,28.0,215.9,117.93392,Florida,USA,1990,1,24,15,2.3,1.5,0.3,12.3,0.078,0.151,0.175,0.43,0.048,1996-97
2,2,Earl Cureton,TOR,39.0,205.74,95.25432,Detroit Mercy,USA,1979,3,58,9,0.8,1.0,0.4,-2.1,0.105,0.102,0.103,0.376,0.148,1996-97
3,3,Ed O'Bannon,DAL,24.0,203.2,100.697424,UCLA,USA,1995,1,9,64,3.7,2.3,0.6,-8.7,0.06,0.149,0.167,0.399,0.077,1996-97
4,4,Ed Pinckney,MIA,34.0,205.74,108.86208,Villanova,USA,1985,1,10,27,2.4,2.4,0.2,-11.2,0.109,0.179,0.127,0.611,0.04,1996-97


In [0]:
#Extracting first name from 'player_name' column

df['first_name']=df.player_name.str.split(" ").map(lambda x: x[0])

In [8]:
df.first_name.head()

0    Dennis
1    Dwayne
2      Earl
3        Ed
4        Ed
Name: first_name, dtype: object

In [0]:
#Extracting last name from 'player_name' column
df['last_name']=df.player_name.str.split(" ").map(lambda x: x[-1])


In [12]:
df.last_name.head()

0        Rodman
1    Schintzius
2       Cureton
3      O'Bannon
4      Pinckney
Name: last_name, dtype: object