# String Methods

### Loading our some data

In this lesson, we'll work with data regarding Midwestern cities.  Now the data we'll be working with is available on [this Wikipedia page]('https://en.wikipedia.org/wiki/List_of_Midwestern_cities_by_size').  But unfortunately, it is not formatted in a way that pandas can understand it.  

So instead we just copied and pasted the data below.  And assigned it to a variable.

In [2]:
cities_text = """Chicago, Illinois; 2,722,586
Indianapolis, Indiana; 853,431
Columbus, Ohio; 852,144
Detroit, Michigan; 679,865
Milwaukee, Wisconsin; 599,086
Kansas City, Missouri; 476,974
Omaha, Nebraska; 463,081
Minneapolis, Minnesota; 411,452
Wichita, Kansas; 389,054
Cleveland, Ohio; 388,812
St. Louis, Missouri; 314,867
St. Paul, Minnesota; 300,820
Cincinnati, Ohio; 298,957
Toledo, Ohio; 279,455
Lincoln, Nebraska; 277,315
Fort Wayne, Indiana; 262,450
Madison, Wisconsin; 248,856
Des Moines, Iowa; 214,778
Aurora, Illinois; 200,946
Grand Rapids, Michigan; 195,355
Akron, Ohio; 198,252
Overland Park, Kansas; 186,147
Sioux Falls, South Dakota; 170,401
Springfield, Missouri; 165,785
Kansas City, Kansas; 151,042
Rockford, Illinois; 148,640
Joliet, Illinois; 148,172
Naperville, Illinois; 146,431
Dayton, Ohio; 140,939
Warren, Michigan; 135,147
Olathe, Kansas; 134,368
Sterling Heights, Michigan; 131,996
Cedar Rapids, Iowa; 130,330
Topeka, Kansas; 127,139
Evansville, Indiana; 119,806
Ann Arbor, Michigan; 119,303
Columbia, Missouri; 118,620
Fargo, North Dakota; 118,099
Independence, Missouri; 117,369
Springfield, Illinois; 116,313
Peoria, Illinois; 115,424
Lansing, Michigan; 115,222
Rochester, Minnesota; 112,683
Elgin, Illinois; 112,628
Green Bay, Wisconsin; 104,796
Davenport, Iowa; 102,268
South Bend, Indiana; 101,928"""

> Notice that we wrapped the string in triple quotes so that our string could span multiple lines.

To begin, select just the first 100 characters from the string to take a closer look at the data.

In [3]:
cities_text[:100]
# 'Chicago, Illinois; 2,722,586\nIndianapolis, Indiana; 853,431\nColumbus, Ohio;
# 852,144\nDetroit, Michiga'

'Chicago, Illinois; 2,722,586\nIndianapolis, Indiana; 853,431\nColumbus, Ohio; 852,144\nDetroit, Michiga'

So we can see that the pattern for each row of data is:
    
`city, State; 9,999,999 \n`

Now currently our data is simply one long string.  Calculate the number of characters in the string.

In [4]:
len(cities_text)

# 1313

1313

Ok, it's a good idea if we begin to organize this string a bit better.  Let's begin by splitting the data by each line.

> It may feel a little tricky, but try some different methods.  You can always google if you get stuck.

In [6]:
city_rows = cities_text.split('\n')

In [7]:
city_rows[:3]

# ['Chicago, Illinois; 2,722,586',
#  'Indianapolis, Indiana; 853,431',
#  'Columbus, Ohio; 852,144']

['Chicago, Illinois; 2,722,586',
 'Indianapolis, Indiana; 853,431',
 'Columbus, Ohio; 852,144']

Next, let's select the first element from the list of strings.

In [8]:
city = city_rows[0]


city
# 'Chicago, Illinois; 2,722,586'

'Chicago, Illinois; 2,722,586'

Ideally, we can organize this as a dictionary.  Where we have the city, state, and population values.

First use split to divide the data between `city_and_state` and `population`.

In [9]:
city_and_state = city.split(';')[0]

population = city.split('; ')[1]

In [10]:
city_and_state
# 'Chicago, Illinois'

'Chicago, Illinois'

In [11]:
population

# '2,722,586'

'2,722,586'

Now separate `city_and_state` into `city_name` and `state`.

In [12]:


city_name = city_and_state.split(',')[0]

state = city_and_state.split(', ')[0]
# ['Chicago', 'Illinois']

In [13]:
city_name

# 'Chicago'

'Chicago'

In [14]:
state

# 'Illinois'

'Chicago'

### Now a bit harder

Ok, now that we explored how to work individually on a row, let's get into this.  Let's now write code to start with the text in our variable `cities_text`.  And end with the variable `cities` with cities is a list of dictionaries with keys of `city`, `state`, and `population`.

In [15]:
cities_text[:100]

# 'Chicago, Illinois; 2,722,586\nIndianapolis, Indiana; 853,431\nColumbus, Ohio;
# 852,144\nDetroit, Michiga'

'Chicago, Illinois; 2,722,586\nIndianapolis, Indiana; 853,431\nColumbus, Ohio; 852,144\nDetroit, Michiga'

In [16]:
# fill in code here
cities_clean = (cities_text.split('\n'))
cities =[]
for city in cities_clean:
    city_state = city.split('; ')[0]
    population = city.split('; ')[1]
    name = city_state.split(',')[0]
    state = city_state.split(', ')[1]
    new = {'city': name, 'state': state, 'population': population}
    cities.append(new)
print(cities)

[{'city': 'Chicago', 'state': 'Illinois', 'population': '2,722,586'}, {'city': 'Indianapolis', 'state': 'Indiana', 'population': '853,431'}, {'city': 'Columbus', 'state': 'Ohio', 'population': '852,144'}, {'city': 'Detroit', 'state': 'Michigan', 'population': '679,865'}, {'city': 'Milwaukee', 'state': 'Wisconsin', 'population': '599,086'}, {'city': 'Kansas City', 'state': 'Missouri', 'population': '476,974'}, {'city': 'Omaha', 'state': 'Nebraska', 'population': '463,081'}, {'city': 'Minneapolis', 'state': 'Minnesota', 'population': '411,452'}, {'city': 'Wichita', 'state': 'Kansas', 'population': '389,054'}, {'city': 'Cleveland', 'state': 'Ohio', 'population': '388,812'}, {'city': 'St. Louis', 'state': 'Missouri', 'population': '314,867'}, {'city': 'St. Paul', 'state': 'Minnesota', 'population': '300,820'}, {'city': 'Cincinnati', 'state': 'Ohio', 'population': '298,957'}, {'city': 'Toledo', 'state': 'Ohio', 'population': '279,455'}, {'city': 'Lincoln', 'state': 'Nebraska', 'population':

In [17]:
cities[:3]

# [{'city': 'Chicago', 'state': 'Illinois', 'population': '2,722,586'},
#  {'city': 'Indianapolis', 'state': 'Indiana', 'population': '853,431'},
#  {'city': 'Columbus', 'state': 'Ohio', 'population': '852,144'}]

[{'city': 'Chicago', 'state': 'Illinois', 'population': '2,722,586'},
 {'city': 'Indianapolis', 'state': 'Indiana', 'population': '853,431'},
 {'city': 'Columbus', 'state': 'Ohio', 'population': '852,144'}]

Now, one issue with the code above is that we would really like the population to be an integer, not a string.  So copy the code that we have above into the cell below, and update the code to make population an integer.

> See if you can involve the `join` or the `replace` method to accomplish this step.

In [18]:
# fill in code here
cities_clean = (cities_text.split('\n'))
cities =[]
for city in cities_clean:
    city_state = city.split('; ')[0]
    population = int(city.split('; ')[1].replace(',', ''))
    name = city_state.split(',')[0]
    state = city_state.split(', ')[1]
    new = {'city': name, 'state': state, 'population': population}
    cities.append(new)
print(cities)

[{'city': 'Chicago', 'state': 'Illinois', 'population': 2722586}, {'city': 'Indianapolis', 'state': 'Indiana', 'population': 853431}, {'city': 'Columbus', 'state': 'Ohio', 'population': 852144}, {'city': 'Detroit', 'state': 'Michigan', 'population': 679865}, {'city': 'Milwaukee', 'state': 'Wisconsin', 'population': 599086}, {'city': 'Kansas City', 'state': 'Missouri', 'population': 476974}, {'city': 'Omaha', 'state': 'Nebraska', 'population': 463081}, {'city': 'Minneapolis', 'state': 'Minnesota', 'population': 411452}, {'city': 'Wichita', 'state': 'Kansas', 'population': 389054}, {'city': 'Cleveland', 'state': 'Ohio', 'population': 388812}, {'city': 'St. Louis', 'state': 'Missouri', 'population': 314867}, {'city': 'St. Paul', 'state': 'Minnesota', 'population': 300820}, {'city': 'Cincinnati', 'state': 'Ohio', 'population': 298957}, {'city': 'Toledo', 'state': 'Ohio', 'population': 279455}, {'city': 'Lincoln', 'state': 'Nebraska', 'population': 277315}, {'city': 'Fort Wayne', 'state': '

In [19]:
cities[:3]

# [{'city': 'Chicago', 'state': 'Illinois', 'population': 2722586},
#  {'city': 'Indianapolis', 'state': 'Indiana', 'population': 853431},
#  {'city': 'Columbus', 'state': 'Ohio', 'population': 852144}]

[{'city': 'Chicago', 'state': 'Illinois', 'population': 2722586},
 {'city': 'Indianapolis', 'state': 'Indiana', 'population': 853431},
 {'city': 'Columbus', 'state': 'Ohio', 'population': 852144}]

Ok, now that's better.

### Summary

In this lesson, we practiced using the split and join methods to start with some raw text and organize it into a more easy to use list of dictionaries.

Also pay attention to the workflow that we used.  

1. Perform on one, then all

Above, we first selected a single row of data.  

In [20]:
city_rows[0]

'Chicago, Illinois; 2,722,586'

The we worked on solving making progress with just a single row of data.  Only after feeling comfortable in working through the problem on a single element did we then move to the loop of data.

2. Iterate on our solution

In working with through the loop of data, we first were satisfied with a partially working solution: a list of dictionaries where all data was a string.  Only after getting that working did we then move onto converting our population into an integer.  

> Finding ways to break down our coding problems into steps and making the problem easier, will be a critical skillset moving forward.