# Voters in Florida

The file FloridaVoters.html contains a Web Table of republican and democratic voters in various counties in Florida. The code reads in this file as a standard text file and prints out the counties, along with the number of republican and democratic voters in those counties, sorted by the number of democratic voters. The output looks like this: <br><br>
LAFAYETTE 1373 2672<br>
GLADES 2190 3110<br>
LIBERTY 720 3372<br>
...<br>
MIAMI-DADE 362161 539367<br>
BROWARD 249762 566185<br>
Total 4377713 4637026<br>

In [117]:
## Import Pandas and Regular Expressions Package
import pandas as pd
import re

In [118]:
## Open html file in read mode
file = open('FloridaVoters.html','r')

In [119]:
## Read file contents and omit newline characters

html = file.read().replace('\n','')
html

'\t<!DOCTYPE html><html class="no-js">  <head>    <title>Voter Registration - Current by County - Division of Elections - Florida Department of State </title>      <meta charset="UTF-8" />      <meta name="viewport" content="width=device-width, initial-scale=1.0" />      <meta http-equiv="X-UA-Compatible" content="IE=edge" /><script type="text/javascript">window.NREUM||(NREUM={});NREUM.info = {"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"a2a55726f0","applicationID":"6568901","transactionName":"YFEGYkVRWUQAUBJeW1kbKWB0H2VSD1cDRXlBVydZWURFWA1fA0UbZ1UDUw==","queueTime":15,"applicationTime":73,"ttGuid":"98871DE0A3060523","agent":""}</script><script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(e,n,t){function r(t){if(!n[t]){var o=n[t]={exports:{}};e[t][0].call(o.exports,function(n){var o=e[t][1][n];return r(o||n)},o,o.exports)}return n[t].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<t.length;o++)r(t[o]);ret

In [120]:
## Find a match for contents of the table on the html page

match = re.findall('<tbody>(.*)</tbody>',html)
match

['<tr><td>ALACHUA</td><td>47,329</td><td>77,996</td><td>3,864</td><td>34,116</td><td>163,305</td></tr><tr><td>BAKER</td><td>6,963</td><td>5,813</td><td>184</td><td>1,237</td><td>14,197</td></tr><tr><td>BAY</td><td>57,456</td><td>30,733</td><td>2,441</td><td>20,625</td><td>111,255</td></tr><tr><td>BRADFORD</td><td>6,878</td><td>6,533</td><td>251</td><td>1,971</td><td>15,633</td></tr><tr><td>BREVARD</td><td>167,129</td><td>127,435</td><td>13,960</td><td>86,702</td><td>395,226</td></tr><tr><td>BROWARD</td><td>249,762</td><td>566,185</td><td>17,602</td><td>286,877</td><td>1,120,426</td></tr><tr><td>CALHOUN</td><td>2,201</td><td>5,324</td><td>86</td><td>771</td><td>8,382</td></tr><tr><td>CHARLOTTE</td><td>54,421</td><td>35,602</td><td>5,246</td><td>29,131</td><td>124,400</td></tr><tr><td>CITRUS</td><td>46,373</td><td>30,440</td><td>3,394</td><td>21,996</td><td>102,203</td></tr><tr><td>CLAY</td><td>76,584</td><td>31,285</td><td>3,782</td><td>29,120</td><td>140,771</td></tr><tr><td>COLLIER</t

In [121]:
## Convert the resulting list to string for further matches

match_str = ''.join(match)
match_str

'<tr><td>ALACHUA</td><td>47,329</td><td>77,996</td><td>3,864</td><td>34,116</td><td>163,305</td></tr><tr><td>BAKER</td><td>6,963</td><td>5,813</td><td>184</td><td>1,237</td><td>14,197</td></tr><tr><td>BAY</td><td>57,456</td><td>30,733</td><td>2,441</td><td>20,625</td><td>111,255</td></tr><tr><td>BRADFORD</td><td>6,878</td><td>6,533</td><td>251</td><td>1,971</td><td>15,633</td></tr><tr><td>BREVARD</td><td>167,129</td><td>127,435</td><td>13,960</td><td>86,702</td><td>395,226</td></tr><tr><td>BROWARD</td><td>249,762</td><td>566,185</td><td>17,602</td><td>286,877</td><td>1,120,426</td></tr><tr><td>CALHOUN</td><td>2,201</td><td>5,324</td><td>86</td><td>771</td><td>8,382</td></tr><tr><td>CHARLOTTE</td><td>54,421</td><td>35,602</td><td>5,246</td><td>29,131</td><td>124,400</td></tr><tr><td>CITRUS</td><td>46,373</td><td>30,440</td><td>3,394</td><td>21,996</td><td>102,203</td></tr><tr><td>CLAY</td><td>76,584</td><td>31,285</td><td>3,782</td><td>29,120</td><td>140,771</td></tr><tr><td>COLLIER</td

In [122]:
## Find a match for all row tags in html page

match_1 = re.findall('<tr>(.*?)<\/tr>',match_str)
match_1

['<td>ALACHUA</td><td>47,329</td><td>77,996</td><td>3,864</td><td>34,116</td><td>163,305</td>',
 '<td>BAKER</td><td>6,963</td><td>5,813</td><td>184</td><td>1,237</td><td>14,197</td>',
 '<td>BAY</td><td>57,456</td><td>30,733</td><td>2,441</td><td>20,625</td><td>111,255</td>',
 '<td>BRADFORD</td><td>6,878</td><td>6,533</td><td>251</td><td>1,971</td><td>15,633</td>',
 '<td>BREVARD</td><td>167,129</td><td>127,435</td><td>13,960</td><td>86,702</td><td>395,226</td>',
 '<td>BROWARD</td><td>249,762</td><td>566,185</td><td>17,602</td><td>286,877</td><td>1,120,426</td>',
 '<td>CALHOUN</td><td>2,201</td><td>5,324</td><td>86</td><td>771</td><td>8,382</td>',
 '<td>CHARLOTTE</td><td>54,421</td><td>35,602</td><td>5,246</td><td>29,131</td><td>124,400</td>',
 '<td>CITRUS</td><td>46,373</td><td>30,440</td><td>3,394</td><td>21,996</td><td>102,203</td>',
 '<td>CLAY</td><td>76,584</td><td>31,285</td><td>3,782</td><td>29,120</td><td>140,771</td>',
 '<td>COLLIER</td><td>100,167</td><td>45,511</td><td>4,622</

In [123]:
## Finally, find a match for all cell tags and create a dataframe with the final data

df = pd.DataFrame()
for item in match_1:
    match_2 = re.findall('<td>(.*?)<\/td>',item)
    df = df.append(pd.DataFrame([match_2]))
    
df

Unnamed: 0,0,1,2,3,4,5
0,ALACHUA,47329,77996,3864,34116,163305
0,BAKER,6963,5813,184,1237,14197
0,BAY,57456,30733,2441,20625,111255
0,BRADFORD,6878,6533,251,1971,15633
0,BREVARD,167129,127435,13960,86702,395226
...,...,...,...,...,...,...
0,VOLUSIA,121402,124136,11537,88882,345957
0,WAKULLA,7374,8889,560,2681,19504
0,WALTON,25609,10013,842,8150,44614
0,WASHINGTON,7101,5687,221,1690,14699


In [124]:
## Omit ',' from the numbers
df= df.replace(',','', regex=True)

## Consider only first three columns and rename it for better readability
df =df.iloc[:,0:3]
df.columns = ['County', 'Republican Voters', 'Democrat Voters']

## Convert the type of integer strings to integer 
data_types_dict = {'County': str, 'Republican Voters':int, 'Democrat Voters':int}
df = df.astype(data_types_dict)

## Sort the rows by using 'Democrat Voters' as key
df_final = df.sort_values(by='Democrat Voters')
df_final = df_final.reset_index(drop = True)

df_final

Unnamed: 0,County,Republican Voters,Democrat Voters
0,LAFAYETTE,1373,2672
1,GLADES,2190,3110
2,LIBERTY,720,3372
3,UNION,2752,3579
4,GILCHRIST,5789,3652
...,...,...,...
63,HILLSBOROUGH,257436,314265
64,PALM BEACH,245452,367236
65,MIAMI-DADE,362161,539367
66,BROWARD,249762,566185


In [125]:
## Printing the final output

for county,rep_voters,dem_voters in df_final.values.tolist():
    print(county, rep_voters, dem_voters)

LAFAYETTE 1373 2672
GLADES 2190 3110
LIBERTY 720 3372
UNION 2752 3579
GILCHRIST 5789 3652
FRANKLIN 2234 4319
HOLMES 5282 4434
GULF 4234 4521
HARDEE 4859 4702
HAMILTON 2154 4796
DIXIE 3314 4839
CALHOUN 2201 5324
WASHINGTON 7101 5687
JEFFERSON 2636 5802
BAKER 6963 5813
BRADFORD 6878 6533
TAYLOR 3950 6915
MADISON 2992 7158
DESOTO 4870 7181
OKEECHOBEE 7755 7756
HENDRY 5862 7999
WAKULLA 7374 8889
LEVY 11665 9509
WALTON 25609 10013
SUWANNEE 10745 11126
NASSAU 32958 14013
COLUMBIA 15790 14797
JACKSON 9626 15706
MONROE 20602 17614
HIGHLANDS 27100 19997
PUTNAM 17067 20606
GADSDEN 4372 22161
SUMTER 47158 22977
FLAGLER 30047 24734
OKALOOSA 75154 25172
SANTA ROSA 73627 26114
MARTIN 53800 27358
INDIAN RIVER 47794 28204
CITRUS 46373 30440
BAY 57456 30733
CLAY 76584 31285
CHARLOTTE 54421 35602
ST. JOHNS 88385 39531
HERNANDO 51254 42499
COLLIER 100167 45511
LAKE 93604 67237
MANATEE 96063 67926
ESCAMBIA 90265 70180
OSCEOLA 44594 75657
ST. LUCIE 59626 76163
MARION 97306 76268
ALACHUA 47329 77996
SARASOT

# END-OF-CODE