# String Methods

### Introduction

In this lesson, we'll go a little further into our understanding of strings.  Lots of times when we're working with data on the Internet, we're really working with strings.  So it's important for us to know the different methods for cleaning up this kind of data, and then how to coerce it into different data structures that may be easiest to work with.

### Loading our Data

For this lesson, let's work with data regarding NBA players.  We can go to a website and scrape roster data with something like the following.

> Press `shift + return` on the following.

In [1]:
import pandas as pd

url = "https://www.espn.com/nba/team/roster/_/name/phi"

roster = pd.read_html(url)[0]

> Or we can just load the data from the following CSV file.

In [2]:
import pandas as pd
url = "https://raw.githubusercontent.com/eng-6-22/mod-1-fundamentals/master/sixers_roster.csv"
roster_df = pd.read_csv(url)

Ok, let's work with the first list of data.

> Before doing so, we'll use just a bit of pandas here to slightly clean up our data.  You can ignore what we're doing below.  Just press `shift + return`.  

In [3]:
players_df = roster_df[['Name', 'POS', 'Age', 'HT', 'WT', 'College', 'Salary']]

Then we convert our data to a list of dictionaries.

In [4]:
players = players_df.to_dict('records')

Now let's look at the data in players.

In [5]:
players[:2]

[{'Name': 'Ryan Broekhoff45',
  'POS': 'SG',
  'Age': 30,
  'HT': '6\' 6"',
  'WT': '215 lbs',
  'College': 'Valparaiso',
  'Salary': '$1,416,852'},
 {'Name': 'Alec Burks20',
  'POS': 'SG',
  'Age': 29,
  'HT': '6\' 6"',
  'WT': '214 lbs',
  'College': 'Colorado',
  'Salary': '$1,620,564'}]

So `players` is a list of dictionaries. And if we look at the first few players, we can see that there are various issues with the text.  It would be nice, if we could programatically clean up some of this data.

Let's focus in on the first nba player and see how we can do so.

In [6]:
player = players[0]

In [7]:
player

{'Name': 'Ryan Broekhoff45',
 'POS': 'SG',
 'Age': 30,
 'HT': '6\' 6"',
 'WT': '215 lbs',
 'College': 'Valparaiso',
 'Salary': '$1,416,852'}

Now from there we can select the current player name.

In [52]:
name = player['Name']
name

'Ryan Broekhoff45'

Ok now it's time to focus on cleaning up this data.  To do so, there are few things about strings we should learn.

### Strings are like lists

One thing that can help us our in working with strings is recognizing they are pretty similar to lists.  A string is really just a collection of characters.  And because of that, we can perform similar operations as we would on a list.

For example, below we select the first from string.

In [53]:
name[0]

'R'

> So just like with a list, we use the bracket accessors followed by the index.

And, just like in a list, we can also slice elements from a string.  For example, let's slice all but the last two elements from our string above.

In [55]:
name

'Ryan Broekhoff45'

In [56]:
name[0:-2]

'Ryan Broekhoff'

### Changing between Strings and Lists

1. Splitting our Strings

Often times we'll want to divide go from a string to a list of words.  We can do so with the split method.  

In [61]:
name

'Ryan Broekhoff45'

In [60]:
name.split()

['Ryan', 'Broekhoff45']

The default behavior with `split` is to divide the string by a space.  But really, we can split by any character that we like.

> For example, let's select the player's salary.

In [16]:
salary = player['Salary']
salary

'$1,416,852'

1. `replace`

Now one way to remove the commas is with the replace method.

In [17]:
salary.replace(',', '')

'$1416852'

So above we are replacing the `,` with an empty string `''`, which effectively removes it.

2. Split

Another way we can remove the commas is with the `split` method.  Below we'll start with our salary again, and then split by comma.

In [18]:
salary

'$1,416,852'

In [19]:
salary_div = salary.split(',')

salary_div

['$1', '416', '852']

Notice that the `split` method removes the character that we are splitting on, and turns each partition into a separate element in a list.

2. Joining from lists

Now a way to go from list to string is with the join method.  Let's take another look at our `salary_div` list.

In [68]:
salary_div

['$1', '416', '852']

In [67]:
''.join(salary_div)

'$1416852'

So with join, we start with the string we are joining by and then pass through the list as the argument.  If we want to reinsert our commas, we can simply join on a string with a comma.

In [69]:
','.join(salary_div)

'$1,416,852'

So moving between lists and strings can often be a good way to clean up our data.  For example, let's clean up the weight information about a player.

In [74]:
player['WT']

'215 lbs'

In [76]:
player['WT'].split(' ')

['215', 'lbs']

In [79]:
player['WT'].split(' ')[0]

'215'

And now we can even coerce this to an integer.

In [80]:
int(player['WT'].split(' ')[0])

215

### Summary

In this lesson, we continued our understanding of operations we can perform using strings.  For example, we saw that we can think of strings as a collection of characters.  And to that respect, we can select certain characters from a string with our bracket accessors.

In [84]:
name = 'Ryan Broekhoff45'

name[0:-2]

'Ryan Broekhoff'

We also saw that we can use the replace method to subsitute, or to simply remove characters from a string. 

In [22]:
salary = '$1,416,852'

salary.replace(',', '')

'$1416852'

In addition, we can divide our string by a specified character using split.

In [91]:
name[0:-2].split(' ')

['Ryan', 'Broekhoff']

In [89]:
name[0:-2].split('B')

['Ryan ', 'roekhoff']

Just be aware that the character we split by is removed.  Finally, we can go from a list back to a string with the join method.  

In [92]:
' '.join(['Ryan', 'Broekhoff'])

'Ryan Broekhoff'