# geophf/1HaskellADay

Fetching contributors…
Cannot retrieve contributors at this time
173 lines (130 sloc) 5.94 KB
 module Y2017.M01.D05.Solution where import Control.Arrow ((&&&)) import Data.Array import Data.Function (on) import Data.List (sortBy) import qualified Data.Map as Map import Data.Maybe (mapMaybe) -- below imports available from 1HaskellADay git repository import Data.Monetary.Currency (value) import Data.SAIPE.USStates import Graph.ScoreCard import Y2016.M12.D21.Solution import Y2016.M12.D22.Solution import Y2017.M01.D04.Solution {-- So, yesterday we looked at US State SAIPE data. The solution showed us the two standout states: Wyoming had the smallest population and the least poverty, and California had the largest population and the most poverty. But 'least' and 'most' can be measured several ways. Yesterday, we simply looked at the number of people in poverty, but did not take into account the total population of the State. Who knows? Maybe California has the least poverty if one adjusts for population? Yes? No? Let's find out. Today's Haskell exercise. Read in the SAIPE data from Y2016/M12/D15/SAIPESNC_15DEC16_11_35_13_00.csv.gz collating by US State (see yesterday's exercise), then, enhance the score-card with the ratio of people in poverty to the entire population: --} type PovertyRatio = Float -- really a poverty / population ration, but okay povertyRatio :: ScoreCard a Axes Float -> PovertyRatio povertyRatio = ((/) . (! Poverty) <*> (! Population)) . values data Attribs = POPULATION | POVERTY | POVERTYRATIO | TOTALDEBT | PERCAPITADEBT -- will be used (read further down) deriving (Eq, Ord, Enum, Bounded, Ix, Show) type EnhancedSC = ScoreCard USState Attribs Float enhancedScoreCard :: ScoreCard USState Axes Float -> EnhancedSC enhancedScoreCard sc@(SC state arr) = SC state (listArray (POPULATION, POVERTYRATIO) ([arr ! Population, arr ! Poverty, povertyRatio sc])) -- Great. Now. Which US State has the highest poverty ratio? The lowest? impoverished :: [EnhancedSC] -> [EnhancedSC] impoverished = sortBy (compare `on` (! POVERTYRATIO) . values) {-- *Y2017.M01.D05.Solution> readSAIPEUSStateData "Y2016/M12/D15/SAIPESNC_15DEC16_11_35_13_00.csv.gz" ~> states *Y2017.M01.D05.Solution> let enhsts = map enhancedScoreCard states *Y2017.M01.D05.Solution> head enhsts SC {idx = Alabama, values = [(POPULATION,4736374.0),(POVERTY,875853.0), (POVERTYRATIO,0.18492058)]} *Y2017.M01.D05.Solution> let improv = impoverished enhsts least impoverished: *Y2017.M01.D05.Solution> head improv SC {idx = New Hampshire, values = [(POPULATION,1288048.0),(POVERTY,108293.0), (POVERTYRATIO,8.407528e-2)]} most impoverished: *Y2017.M01.D05.Solution> last improv SC {idx = Mississippi, values = [(POPULATION,2896612.0),(POVERTY,638919.0), (POVERTYRATIO,0.22057459)]} Remember reading in US State total and per capita debt? That was the exercise for Y2016.M12.D22. Re-read in that information again from: Y2016/M12/D22/personal_debt_load_by_US_state.csv And round out the US State scorecard information with those attributes: --} debtInfoAugmenter :: EnhancedSC -> USStateDebt -> EnhancedSC debtInfoAugmenter (SC state arr) debtinfo = let conv f = fromRational . value . f in SC state (array (POPULATION, PERCAPITADEBT) (assocs arr ++ [(TOTALDEBT, conv stateDebt debtinfo), (PERCAPITADEBT, conv perCapitaDebt debtinfo)])) {-- So, here's the thing: the [EnhancedSC] may not be in the same order as the [USStateDebt] ... so we have to match by US State... MAP TO THE RESCUE! --} augmentWithDebtInfo :: [EnhancedSC] -> [USStateDebt] -> [EnhancedSC] augmentWithDebtInfo states = let mapped = Map.fromList (map (idx &&& id) states) in mapMaybe (\debt -> Map.lookup (name debt) mapped >>= return . flip debtInfoAugmenter debt) {-- *Y2017.M01.D05.Solution> readUSStateDebtData debtURL ~> debts *Y2017.M01.D05.Solution> let auggies = augmentWithDebtInfo enhsts debts *Y2017.M01.D05.Solution> head auggies SC {idx = Alabama, values = [(POPULATION,4736374.0),(POVERTY,875853.0), (POVERTYRATIO,0.18492058),(TOTALDEBT,6.8343595e10), (PERCAPITADEBT,14173.005)]} Now that you have a set of collated data for US States with debt and poverty information, is there a correllation? Or: what are the top 5 US States in debt? Bottom 5? What are the top 5 US State with the highest poverty ratios? Lowest? debts: *Y2017.M01.D05.Solution> let sortedDebts = sortBy (compare `on` (! TOTALDEBT) . values) auggies bottom 5: *Y2017.M01.D05.Solution> mapM_ (print . (idx &&& (! TOTALDEBT) . values)) (take 5 sortedDebts) (South Dakota,7.707458e9) (Vermont,7.866666e9) (North Dakota,9.263742e9) (Wyoming,9.951523e9) (Nebraska,1.3139045e10) top5: *Y2017.M01.D05.Solution> mapM_ (print . (idx &&& (! TOTALDEBT) . values)) (take 5 (reverse sortedDebts)) (California,7.779184e11) (New York,3.8746567e11) (Texas,3.4094422e11) (Illinois,3.213541e11) (Ohio,3.2134077e11) California takes the cake! Poverty ratios: *Y2017.M01.D05.Solution> let sortedPovertyRatios = sortBy (compare `on` (! POVERTYRATIO) . values) auggies Lowest poverty ratios: *Y2017.M01.D05.Solution> mapM_ (print . (idx &&& (! POVERTYRATIO) . values)) (take 5 sortedPovertyRatios) (New Hampshire,8.407528e-2) (Maryland,9.949522e-2) (Minnesota,0.1018338) (Alaska,0.103974395) (Vermont,0.104290456) highest poverty ratios: *Y2017.M01.D05.Solution> mapM_ (print . (idx &&& (! POVERTYRATIO) . values)) (take 5 (reverse sortedPovertyRatios)) (Mississippi,0.22057459) (New Mexico,0.19827485) (Louisiana,0.19504689) (Arkansas,0.18727748) (Alabama,0.18492058) Now ratios are very small numbers [0, 1], but populations and debts are very big ones. Tomorrow we'll level these disparate data sets and look at clustered results. Added charts and CSV files of resulting data analyses: Y2017/M01/D05/01-poverty-us-state.png Y2017/M01/D05/02-total-debts-us-state.png Y2017/M01/D05/debt-by-us-state.csv Y2017/M01/D05/poverty-by-us-state.csv --}