# Exploratory Data Analysis Solutions
---

## <font color='red'> Now you try
    
Today we'll be working with a dataset on the gender pay gap across companies in the UK. 

Let's start by downloading the dataset and converting it to a Pandas `DataFrame`.

1. Use the `read_csv()` Pandas function to read in two files from the `Data` directory (which is inside the directory this notebook is in). 

The files have been downloaded from https://gender-pay-gap.service.gov.uk/viewing/download. Here's how you should load each of them in:

* `UK Gender Pay Gap Data - 2019 to 2020.csv`; read this in as a DataFrame called `pay_gap_2019_20`
    
* `UK Gender Pay Gap Data - 2018 to 2019.csv`; read this in as a DataFrame called `pay_gap_2018_19`


2. Use the `head` command on `pay_gap_2019_20` to visually inspect the data. What's strange about it? Use `read_csv()` again but try playing around with the `header` parameter (e.g. `read_csv(header=5)`) until the final DataFrame looks right. What does the `header` parameter do?


3. Continue to inspect `pay_gap_2019_20` visually and figure out:

    
* What the data contains
 
* What each column corresponds to
    
* What each row corresponds to
    

3. Use `shape` to figure out how many rows are in `pay_gap_2019_20` and `pay_gap_2018_19`. Use the Gender Pay Gap Service website to explain why there's a difference in size



4. List as many potential data quality issues as you can in `pay_gap_2019_20`

---

In [2]:
import pandas as pd

pay_gap_2019_20 = pd.read_csv('./data/UK Gender Pay Gap Data - 2019 to 2020.csv',header=2)
pay_gap_2018_19 = pd.read_csv('./data/UK Gender Pay Gap Data - 2018 to 2019.csv')
                              

In [3]:
pay_gap_2019_20.head()

Unnamed: 0,EmployerName,Address,CompanyNumber,SicCodes,DiffMeanHourlyPercent,DiffMedianHourlyPercent,DiffMeanBonusPercent,DiffMedianBonusPercent,MaleBonusPercent,FemaleBonusPercent,...,FemaleUpperMiddleQuartile,MaleTopQuartile,FemaleTopQuartile,CompanyLinkToGPGInfo,ResponsiblePerson,EmployerSize,CurrentName,SubmittedAfterTheDeadline,DueDate,DateSubmitted
0,1ST CHOICE STAFF RECRUITMENT LIMITED,"1ST CHOICE RECRUITMENT,\r\n8 St. Loyes Street,...",7972006,78109,-2.3,0.0,-114.8,-249.3,1.1,0.4,...,37.1,50.0,50.0,https://www.1stchoice.net/gender-pay-gap-repor...,Gill Knight (MD),250 to 499,1ST CHOICE STAFF RECRUITMENT LIMITED,False,05/04/2020 00:00,24/01/2020 09:37
1,23.5 DEGREES LIMITED,"Unit 3 Hedge End Retail Park, Charles Watts Wa...",8014079,56103,10.0,0.0,79.0,35.0,4.0,2.0,...,70.0,31.0,69.0,https://www.23-5degrees.com/gender-pay-gap,Luca Contardo (CFO),500 to 999,23.5 DEGREES LIMITED,False,05/04/2020 00:00,11/11/2019 15:33
2,A. & B. GLASS COMPANY LIMITED,"Addison Road,\r\nChilton Industrial Estate,\r\...",1543721,43342,19.0,4.0,42.0,45.0,70.0,41.0,...,24.0,90.0,10.0,,PHILIP FARNELL (GROUP HR MANAGER),250 to 499,A. & B. GLASS COMPANY LIMITED,False,05/04/2020 00:00,20/05/2019 16:34
3,A.B.M. CATERING LIMITED,"Eagle Court,\r\n63-67 Saltisford,\r\nWarwick,\...",4168334,"56290,\r\n70100",21.7,16.5,-70.3,61.5,15.1,6.5,...,87.8,33.9,66.1,http://www.abmcatering.co.uk/wp-content/upload...,Sue Hill (Finance & HR Director - Operations),1000 to 4999,A.B.M. CATERING LIMITED,False,05/04/2020 00:00,29/01/2020 12:20
4,A.G. BARR P.L.C.,"Westfield House,\r\n4 Mollins Road,\r\nCumbern...",SC005653,11070,2.3,-6.8,41.9,-3.0,93.1,94.3,...,24.0,65.0,35.0,https://www.agbarr.co.uk/responsibility/we-act...,Doug Brown (Head of Human Resources),500 to 999,A.G. BARR P.L.C.,False,05/04/2020 00:00,23/01/2020 11:57


In [4]:
pay_gap_2019_20.shape

(992, 25)

In [5]:
pay_gap_2018_19.shape

(10812, 25)

## <font color='red'> Now you try
    
1. Use pandas to read in the file `country_demographics.csv` from the `data` folder, as a DataFrame called `county_data`. 


2. Visually inspect the DataFrame. What's the main problem with this data?


3. Luckily, we have a data dictionary that can help us below! Below is some pre-written code that loads in our data dictionary as a DataFrame, then converts it to a dictionary.

You'll see that the dictionary translates the column names in `county_data` to longer names. Use this dictionary to rename the columns in `county_data` formatted in **snake case** i.e. so each column is in lowercase, with words separated by an underscore. For example, the column titled `PST045214` in `county_demographics` should be renamed as `population_2014_estimate`.

**There are more and less efficient ways of completing this task. Try to go for the method that's smart but lazy. Hint: it might involve a `for` loop around the dictionary, and the `.lower()` and `join()` string methods...**

4. Figure out how to use the `to_csv()` method to save the DataFrame with renamed columns as a file called `county_data_clean.csv`. Make sure the file is saved in the `data` directory in this folder.

In [14]:
county_data_dictionary = pd.read_csv('./data/county_facts_dictionary.csv').to_dict('records')
county_data_dictionary


[{'column_name': 'PST045214', 'description': 'Population 2014 estimate'},
 {'column_name': 'PST040210',
  'description': 'Population 2010 April 1 estimates base'},
 {'column_name': 'PST120214',
  'description': 'Population percent change  April 1 2010 to July 1 2014'},
 {'column_name': 'POP010210', 'description': 'Population 2010'},
 {'column_name': 'AGE135214',
  'description': 'Persons under 5 years percent 2014'},
 {'column_name': 'AGE295214',
  'description': 'Persons under 18 years percent 2014'},
 {'column_name': 'AGE775214',
  'description': 'Persons 65 years and over percent 2014'},
 {'column_name': 'SEX255214', 'description': 'Female persons percent 2014'},
 {'column_name': 'RHI125214', 'description': 'White alone percent 2014'},
 {'column_name': 'RHI225214',
  'description': 'Black or African American alone percent 2014'},
 {'column_name': 'RHI325214',
  'description': 'American Indian and Alaska Native alone percent 2014'},
 {'column_name': 'RHI425214', 'description': 'Asian

In [28]:
county_data = pd.read_csv('./data/county_facts.csv')
county_data.head()

Unnamed: 0,fips,area_name,state_abbreviation,PST045214,PST040210,PST120214,POP010210,AGE135214,AGE295214,AGE775214,...,SBO415207,SBO015207,MAN450207,WTN220207,RTN130207,RTN131207,AFN120207,BPS030214,LND110210,POP060210
0,1001,Autauga County,AL,55395,54571,1.5,54571,6.0,25.2,13.8,...,0.7,31.7,0,0,598175,12003,88157,131,594.44,91.8
1,1003,Baldwin County,AL,200111,182265,9.8,182265,5.6,22.2,18.7,...,1.3,27.3,1410273,0,2966489,17166,436955,1384,1589.78,114.6
2,1005,Barbour County,AL,26887,27457,-2.1,27457,5.7,21.2,16.5,...,0.0,27.0,0,0,188337,6334,0,8,884.88,31.0
3,1007,Bibb County,AL,22506,22919,-1.8,22915,5.3,21.0,14.8,...,0.0,0.0,0,0,124707,5804,10757,19,622.58,36.8
4,1009,Blount County,AL,57719,57322,0.7,57322,6.1,23.6,17.0,...,0.0,23.2,341544,0,319700,5622,20941,3,644.78,88.9


In [29]:
column_names = county_data.columns.tolist() 

for entry in county_data_dictionary:
    
    old_name = entry['column_name']
    new_name = '_'.join(entry['description'].lower().split())
    column_names[column_names.index(old_name)] = new_name

In [30]:
county_data.columns = column_names

county_data.head()

Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,hispanicowned_firms_percent_2007,womenowned_firms_percent_2007,manufacturers_shipments_2007_$1000,merchant_wholesaler_sales_2007_$1000,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010
0,1001,Autauga County,AL,55395,54571,1.5,54571,6.0,25.2,13.8,...,0.7,31.7,0,0,598175,12003,88157,131,594.44,91.8
1,1003,Baldwin County,AL,200111,182265,9.8,182265,5.6,22.2,18.7,...,1.3,27.3,1410273,0,2966489,17166,436955,1384,1589.78,114.6
2,1005,Barbour County,AL,26887,27457,-2.1,27457,5.7,21.2,16.5,...,0.0,27.0,0,0,188337,6334,0,8,884.88,31.0
3,1007,Bibb County,AL,22506,22919,-1.8,22915,5.3,21.0,14.8,...,0.0,0.0,0,0,124707,5804,10757,19,622.58,36.8
4,1009,Blount County,AL,57719,57322,0.7,57322,6.1,23.6,17.0,...,0.0,23.2,341544,0,319700,5622,20941,3,644.78,88.9


In [31]:
county_data.to_csv('./data/county_data_clean.csv',index=False,encoding='utf-8')

---
## <font color='red'> Now you try
    
1. Using **two** existing columns in `county_data`, work out an estimate of the area of each county in square miles. Store these estimates in a new column called `approximate_area`. What are the units? 


2. Create a new column that calculates the **difference** or **error** between your estimates of area, and the actual values in the `land_area_in_square_miles_2010` column.


3. Figure out how to use the `sum()` method to work out the total area of all counties in the dataset.


4. Figure out how to use the `mean()` method to work out the mean population across all counties.


5. Calculate a new column called `predicted_population_2018` that estimates the 2018 population of each county, assuming the same growth rate in population seen between 2010 and 2014.
---

In [33]:
county_data['approximate_area']= county_data['population_2010_april_1_estimates_base']/county_data['population_per_square_mile_2010']
county_data.head()


Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,womenowned_firms_percent_2007,manufacturers_shipments_2007_$1000,merchant_wholesaler_sales_2007_$1000,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area
0,1001,Autauga County,AL,55395,54571,1.5,54571,6.0,25.2,13.8,...,31.7,0,0,598175,12003,88157,131,594.44,91.8,594.455338
1,1003,Baldwin County,AL,200111,182265,9.8,182265,5.6,22.2,18.7,...,27.3,1410273,0,2966489,17166,436955,1384,1589.78,114.6,1590.445026
2,1005,Barbour County,AL,26887,27457,-2.1,27457,5.7,21.2,16.5,...,27.0,0,0,188337,6334,0,8,884.88,31.0,885.709677
3,1007,Bibb County,AL,22506,22919,-1.8,22915,5.3,21.0,14.8,...,0.0,0,0,124707,5804,10757,19,622.58,36.8,622.798913
4,1009,Blount County,AL,57719,57322,0.7,57322,6.1,23.6,17.0,...,23.2,341544,0,319700,5622,20941,3,644.78,88.9,644.791901


In [34]:
county_data['land_area_error'] = county_data['land_area_in_square_miles_2010']-county_data['approximate_area']
county_data.head()

Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,manufacturers_shipments_2007_$1000,merchant_wholesaler_sales_2007_$1000,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area,land_area_error
0,1001,Autauga County,AL,55395,54571,1.5,54571,6.0,25.2,13.8,...,0,0,598175,12003,88157,131,594.44,91.8,594.455338,-0.015338
1,1003,Baldwin County,AL,200111,182265,9.8,182265,5.6,22.2,18.7,...,1410273,0,2966489,17166,436955,1384,1589.78,114.6,1590.445026,-0.665026
2,1005,Barbour County,AL,26887,27457,-2.1,27457,5.7,21.2,16.5,...,0,0,188337,6334,0,8,884.88,31.0,885.709677,-0.829677
3,1007,Bibb County,AL,22506,22919,-1.8,22915,5.3,21.0,14.8,...,0,0,124707,5804,10757,19,622.58,36.8,622.798913,-0.218913
4,1009,Blount County,AL,57719,57322,0.7,57322,6.1,23.6,17.0,...,341544,0,319700,5622,20941,3,644.78,88.9,644.791901,-0.011901


In [35]:
county_data['land_area_in_square_miles_2010'].sum()

3531907.01

In [36]:
county_data['population_2014_estimate'].mean()

101449.90645879733

In [39]:
scale_factor = 1+(county_data['population_percent_change_april_1_2010_to_july_1_2014']/100)

county_data['predicted_population_2018'] = county_data['population_2014_estimate']*scale_factor

## <font color='red'> Now you try
    
1. How many counties are in Alabama?


2. What's the approximate total **number** of Hispanic/Latino people in Texas?


3. What's the mean per capita income in Kansas? 


4. What proportion of people in the USA live in counties where the median household income is less than $30,000?

---

In [41]:
county_data[county_data['state_abbreviation']=='AL'].shape[0]

67

In [57]:
county_data['number_hispanic_or_latino'] = 0.01*county_data['hispanic_or_latino_percent_2014']*county_data['population_2014_estimate']
county_data.head()


Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area,land_area_error,predicted_population_2018,number_hispanic_or_latino
0,1001,Autauga County,AL,55395,54571,1.5,54571,6.0,25.2,13.8,...,598175,12003,88157,131,594.44,91.8,594.455338,-0.015338,56225.925,1495.665
1,1003,Baldwin County,AL,200111,182265,9.8,182265,5.6,22.2,18.7,...,2966489,17166,436955,1384,1589.78,114.6,1590.445026,-0.665026,219721.878,9205.106
2,1005,Barbour County,AL,26887,27457,-2.1,27457,5.7,21.2,16.5,...,188337,6334,0,8,884.88,31.0,885.709677,-0.829677,26322.373,1209.915
3,1007,Bibb County,AL,22506,22919,-1.8,22915,5.3,21.0,14.8,...,124707,5804,10757,19,622.58,36.8,622.798913,-0.218913,22100.892,472.626
4,1009,Blount County,AL,57719,57322,0.7,57322,6.1,23.6,17.0,...,319700,5622,20941,3,644.78,88.9,644.791901,-0.011901,58123.033,5021.553


In [58]:
county_data[(county_data['state_abbreviation']=='TX')]['number_hispanic_or_latino'].sum()


10412028.688000001

In [62]:
county_data[county_data['state_abbreviation']=='KS']['per_capita_money_income_in_past_12_months_2013_dollars_20092013'].mean()



24216.371428571427

In [60]:
people_in_counties_below_30k = county_data[county_data['median_household_income_20092013']<30000]['population_2014_estimate'].sum()
total_people = county_data['population_2014_estimate'].sum()

people_in_counties_below_30k/total_people


0.006631683258092931

## <font color='red'> Now you try
    
1. Which county has the lowest proportion of people with a bachelors degree?


2. Which county has the highest proportion of people living below the poverty level?


3. What are the top five counties with the highest proportion of old people?
---

In [66]:
county_data.sort_values(by='bachelor\'s_degree_or_higher_percent_of_persons_age_25plus_20092013',ascending=True).head(1)


Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area,land_area_error,predicted_population_2018,number_hispanic_or_latino
504,13239,Quitman County,GA,2315,2513,-7.9,2513,4.5,17.6,27.7,...,2769,1045,0,5,151.24,16.6,151.385542,-0.145542,2132.115,32.41


In [67]:
county_data.sort_values(by='persons_below_poverty_level_percent_20092013',ascending=False).head(1)


Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area,land_area_error,predicted_population_2018,number_hispanic_or_latino
2417,46113,Shannon County,SD,14218,13586,4.7,13586,11.7,38.1,6.8,...,39229,2883,18096,0,2093.9,6.5,2090.153846,3.746154,14886.246,526.066


In [68]:
county_data.sort_values(by='persons_65_years_and_over_percent_2014',ascending=False).head(5)

Unnamed: 0,fips,area_name,state_abbreviation,population_2014_estimate,population_2010_april_1_estimates_base,population_percent_change_april_1_2010_to_july_1_2014,population_2010,persons_under_5_years_percent_2014,persons_under_18_years_percent_2014,persons_65_years_and_over_percent_2014,...,retail_sales_2007_thousands_dollars,retail_sales_per_capita_2007,accommodation_and_food_services_sales_2007_$1000,building_permits_2014,land_area_in_square_miles_2010,population_per_square_mile_2010,approximate_area,land_area_error,predicted_population_2018,number_hispanic_or_latino
379,12119,Sumter County,FL,114350,93420,22.4,93420,2.0,7.4,52.9,...,672106,9244,79933,2570,546.93,170.8,546.955504,-0.025504,139964.4,6403.6
327,12015,Charlotte County,FL,168474,159989,5.3,159978,3.1,13.0,37.7,...,1898114,12072,178021,610,680.28,235.2,680.22534,0.05466,177403.122,11287.758
102,4012,La Paz County,AZ,20231,20489,-1.3,20489,4.6,17.4,36.1,...,394370,19588,37986,26,4499.63,4.6,4454.130435,45.499565,19967.997,5199.367
328,12017,Citrus County,FL,139377,141236,-1.3,141236,3.8,15.0,35.2,...,1405565,10051,109681,233,581.7,242.8,581.69687,0.00313,137565.099,7247.604
2870,51103,Lancaster County,VA,11044,11395,-3.1,11391,3.8,15.0,35.2,...,166615,14500,15990,32,133.25,85.5,133.274854,-0.024854,10701.636,176.704


## <font color='red'> Now you try
    
In the Police dataset, can you use `value_counts` to get breakdowns of:

* The ethnicities of people who were stopped and searched


* The age brackets of people who were stopped and searched


* The 'object of search' (i.e. what officers were looking for)


* The outcomes of the search



* The proportion of people stopped who were men


Use a combination of `value_counts`, `groupby` and filtering skills to calculate:


* The proportion of 18-24 year olds who were stopped, and who were men


* The breakdown of reasons people were stopped, broken down by ethnicity


* The breakdown of the ages of people who were stopped, broken down by gender

---



In [69]:
police_data = pd.read_csv('./data/stop_and_search_may_2019_london.csv')

In [70]:
police_data['self_defined_ethnicity'].value_counts()

White - English/Welsh/Scottish/Northern Irish/British                                   142
Other ethnic group - Not stated                                                         137
Black/African/Caribbean/Black British - African                                          71
Black/African/Caribbean/Black British - Any other Black/African/Caribbean background     40
Black/African/Caribbean/Black British - Caribbean                                        27
White - Any other White background                                                       26
Asian/Asian British - Any other Asian background                                         23
Other ethnic group - Any other ethnic group                                              19
Asian/Asian British - Bangladeshi                                                        11
Asian/Asian British - Pakistani                                                           6
White - Irish                                                                   

In [71]:
police_data['object_of_search'].value_counts()

Controlled drugs                       335
Offensive weapons                       97
Stolen goods                            62
Anything to threaten or harm anyone     20
Evidence of offences under the Act      11
Article for use in theft                 4
Name: object_of_search, dtype: int64

In [72]:
police_data['age_range'].value_counts()

18-24       171
25-34       138
over 34     101
10-17        61
under 10      1
Name: age_range, dtype: int64

In [73]:
police_data['outcome'].value_counts()

A no further action disposal       411
Arrest                              63
Community resolution                46
Penalty Notice for Disorder          9
Summons / charged by post            1
Caution (simple or conditional)      1
Name: outcome, dtype: int64

In [75]:
police_data['gender'].value_counts('normalize')

Male      0.91635
Female    0.08365
Name: gender, dtype: float64

In [76]:
police_data['outcome'].value_counts('normalize')

A no further action disposal       0.772556
Arrest                             0.118421
Community resolution               0.086466
Penalty Notice for Disorder        0.016917
Summons / charged by post          0.001880
Caution (simple or conditional)    0.001880
Name: outcome, dtype: float64

In [78]:
police_data.groupby('age_range')['gender'].value_counts('normalize')


age_range  gender
10-17      Male      0.934426
           Female    0.065574
18-24      Male      0.935673
           Female    0.064327
25-34      Male      0.927536
           Female    0.072464
over 34    Male      0.831683
           Female    0.168317
under 10   Female    1.000000
Name: gender, dtype: float64

In [81]:
police_data.groupby('officer_defined_ethnicity')['object_of_search'].value_counts('normalize')


officer_defined_ethnicity  object_of_search                   
Asian                      Controlled drugs                       0.661290
                           Offensive weapons                      0.161290
                           Anything to threaten or harm anyone    0.080645
                           Stolen goods                           0.064516
                           Article for use in theft               0.016129
                           Evidence of offences under the Act     0.016129
Black                      Controlled drugs                       0.633663
                           Offensive weapons                      0.227723
                           Stolen goods                           0.089109
                           Anything to threaten or harm anyone    0.034653
                           Article for use in theft               0.009901
                           Evidence of offences under the Act     0.004950
Other                      Controlled


## <font color='red'> Now you try
    
1. What's the breakdown of states, in counties where the median household income is equal to or below the **lower quartile value**? (hint: `describe()` will get you lower and upper quartiles)


2. How many people are living in counties where the median household income is equal to or below the lower quartile value?


3. What's the breakdown of states, in counties where the median household income is equal to or above the **upper quartile value**?



---

In [87]:
lq_value = county_data.describe()['median_household_income_20092013']['25%']
uq_value = county_data.describe()['median_household_income_20092013']['75%']

county_data[county_data['median_household_income_20092013']<=lq_value]['state_abbreviation'].value_counts()

GA    86
MS    63
KY    58
TX    56
AR    53
TN    49
MO    43
AL    40
NC    39
VA    35
WV    29
LA    25
OK    23
SC    22
MI    18
FL    18
NM    17
SD    13
CO    11
MT    11
ID    10
IL     8
OH     8
KS     7
OR     7
NE     6
CA     5
ND     5
AZ     4
ME     3
IN     2
PA     2
WA     2
NV     2
NY     1
MN     1
IA     1
VT     1
AK     1
WI     1
Name: state_abbreviation, dtype: int64

In [89]:
county_data[county_data['median_household_income_20092013']>=uq_value]['state_abbreviation'].value_counts()

TX    54
VA    52
MN    35
IL    32
IA    32
CA    32
NY    29
CO    28
GA    26
ND    25
WI    24
OH    23
AK    23
IN    21
PA    21
KS    21
NJ    20
NE    19
MD    18
WA    16
WY    16
MI    15
SD    15
UT    13
MO    13
MA    12
OK    11
FL    11
KY    11
LA    10
NC    10
NV    10
TN     8
CT     8
NH     8
VT     8
OR     7
MT     6
ID     5
HI     5
SC     5
AL     4
RI     4
NM     3
DE     3
ME     3
AR     3
WV     3
MS     3
DC     1
AZ     1
Name: state_abbreviation, dtype: int64