### Cleaning our Data

So above we learned about `and` and `or` statements.  And we saw that one way to work with our messy data is to account for it with our `or` statements.

In [None]:
musical_usernames = []

for user in users:
    if user['Industry'] == 'Music' or user['Industry'] == 'music':
        musical_usernames.append(user['Name'])

But an even better way to work with messy data is to clean it up.  This way, we don't have to remember that industries like music are sometimes upper case and sometimes lower case.

So how do we clean our data?  Well it depends on the exact problem we are dealing with.  Here, our problem is that sometimes our industry is capitalized and sometimes it is not.  

So let's ask Google how to capitalize a string -- the programmers word for text.

<img src="./cap-string.png" width="70%">

And we can test it out for ourselves.

In [None]:
'music'.capitalize()

'Music'

And when our text is already capitalized, we just get back an identical string.

In [None]:
'Music'.capitalize()

'Music'

So now if we have adele, and we need to change her industry -- how do we replace the industry with the capitalized version?

In [None]:
adele = {'Name': 'Adele',
  'Followers': 27488867,
  'Following': 0,
  'Tweets': 310,
  'Nationality/headquarters': 'U.K',
  'Industry': 'music'}

Well remember that we select the industry with the bracket accessor. 

In [None]:
adele['Industry']

'music'

And we can update the industry like so.

In [None]:
adele['Industry'] = 'something else'

In [None]:
adele['Industry']

'something else'

So to update an industry to the capitalized version it would be the following:

In [None]:
user = {'Name': 'Adele',
  'Industry': 'music'}

In [None]:
                   
user['Industry'] = user['Industry'].capitalize()
                    # 'music'.capitalize()
user['Industry']

'Music'

And if we want to capitalize all of our industry data we can do so with the following:

In [None]:
for user in users:
    # set the user's industry to equal the user's industry, capitalized
    user['Industry'] = user['Industry'].capitalize()

The above will go through each dictionary and update the industry to be the capitalized version.  What if that industry is already capitalized, then still assigns the value to be that capitalized version.

In [None]:
'Music'.capitalize()

'Music'

So once we capitalize everything, we can see that now our data is consistent.

In [None]:
industries = []
for user in users:
    industries.append(user['Industry'])
    
industries[:5]

['Politics', 'Music', 'Music', 'Music', 'Music']

In [None]:
print(set(industries))

{'Politics', 'Television', 'News', 'Sports', 'Publishing industry', 'Space agency', 'Business', 'Technology ', 'Music', 'Films/entertainment'}


And because we cleaned our data, we can look for musicians, while being confident that music will always be capitalized.

In [None]:
musical_usernames = []

for user in users:
    if user['Industry'] == 'Music':
        musical_usernames.append(user['Name'])

In [None]:
musical_usernames[:5]

['KATY PERRY', 'Justin Bieber', 'Rihanna', 'Taylor Swift', 'Lady Gaga']

So as you can see, making our data consistent initially can make simplify our queries later on.