### Working with Csv files in Haskell.

Every way I have seen to work with csv files in Haskell seems fraught.<br>
There seems to be a lot of overhead in terms of imported modules and a whole<br>
slew of type-theoretic thinking to be done. This document is intended to act both as<br>
a guide to as well as a bread-crumb trail back through the forest of Haskell's `Data.Csv`.<p>

As a working example, I will demonstrate how to build appropriate record data types<br>
and then extend the `Data.Csv` class `FromRecord` to include these types.<p>

First, the imports:

In [7]:
{-# LANGUAGE DeriveGeneric #-}
import qualified Data.ByteString.Lazy as BL
import Data.Vector (Vector, empty, toList)
import Data.Either.Extra (fromRight)
import GHC.Generics (Generic)
import Data.List
import Data.Csv

Yes, there is a lot here. No, I will not describe what each import is about.<p>
Next, I want to write two data types. The first gives a description for how<br>
a `GoogleRecord` is to be formed, and the second for how a `WyndhamRecord`<br>
is to be formed. Once we have these types, it will be important to extend the<br>
`FromRecord` class to both of them. Since, the `GoogleRecord` is *nicely* formed,<br>
we can use a derived instance. Unfortunately, the `WyndhamRecord` is not so<br>
*nicely* formed and so we will need to define its instance explicitly.

In [10]:
data GoogleRecord = BadGoogleRecord |
                    GRec {accountId :: !Integer,
                          accountName :: !String,
                          groupName :: !String,
                          hotelId :: !Integer,
                          hotelName :: !String} deriving (Generic, Show)

data WyndhamRecord = WRec {brand :: !String, site :: !Integer} |
                     BadWyndhamRecord deriving (Generic, Show)
                     
instance FromRecord GoogleRecord
instance FromRecord WyndhamRecord where -- namedRecord would be more specific.
    parseRecord v = WRec <$> v .! 0 <*> v .! 2 -- 0th and 2nd index in Csv

Now, the dream is to parse actual data and have things play nice.<br>
The `decode` method, native to Data.Csv, appears to want to return<br>
data as a complicated `Either String (Vector a)` type and<br>
so it seems like a good idea to create type synonyms to keep track.<br>
Additionally, writing the parsers seems necessary.

In [12]:
type EitherGoogle  = Either String (Vector GoogleRecord)
type EitherWyndham = Either String (Vector WyndhamRecord)

googleRecords = toList.(fromRight empty).parseCsv
  where parseCsv csv = decode HasHeader csv :: EitherGoogle

wyndhamRecords = toList.(fromRight empty).parseCsv
  where parseCsv csv = decode HasHeader csv :: EitherWyndham

Thankfully, due to the instances above we can explicitly declare<br>
the types for the csv's directly in the parseCsv helper methods!<br>
Pure magic.<p>

Next, let's imagine that we would like to compare how `GroupNames` in<br>
the google csv may be different than the `BrandNames` in the wyndham csv.<br>
I then want a method, say, that matches each google record to each wyndham<br>
record by matching hotelId on the first to site on the second. Then it returns<br>
pairs of google groupNames with wyndham brands.

In [14]:
type Brand = String
type GroupName = String

returnBrand :: [GoogleRecord] -> [WyndhamRecord] -> [(GroupName, [Brand])]
returnBrand [] _ = []
returnBrand (g:gs) ws = (groupName g, brands g ws) : returnBrand gs ws
  where
    brands grec = (map brand).filter (\w -> hotelId grec == site w)

Now for a `main` method to test out the work:

In [20]:
main = do  
  google <- BL.readFile "./../google.csv"
  wyndham <- BL.readFile "./../wyndham.csv"
  let grecords = googleRecords google
  let wrecords = wyndhamRecords wyndham
  print $ returnBrand grecords wrecords
  
main

[(" Wyndham",["Wyndham"]),(" Wyndham",[]),(" Ramada",[]),(" Baymont",["Redmond"])]