### Docs

In [1]:
from IPython.display import Markdown, display

display(Markdown("dor_databank.md"))

# MA [DOR Databank](https://www.mass.gov/municipal-databank-data-analytics)

The [Division of Local Services’ Data Analytics and Resources Bureau - DLS](https://www.mass.gov/orgs/division-of-local-services) analyzes and distributes data related to local government. All analytics use the data submitted to DLS by individual cities, towns, special purpose districts, regional school districts, and state and federal agencies.

The DOR Databank is organized into five groups:

1. Revenue and Expenditures
2. Municipal Debt
3. Property Taxes
4. Socioeconomic
5. Local Aid and Taxes

   
Each Databank group has 3-22 different spreadsheets for a total of 45 different types of time series generally for all (351) Massacusetts cities and towns for a given fiscal year (6/30) with upwards of 30 years of history.  Each spreadsheet column is a timeseries.  There are 524 different time series generally indexed by ```<municipality> <year>```.  These time series range form Police, Fire and Education expenditures to annual changes in the CPI.

## Extract, Transform and Load.

Selenium extract using URLs and webpage options IDs from database table ``` common.dor_databank_definitions```, copied below.  The url is prefixed by

```python

    url_prefix = "https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport="
    
    url_suffix = "&rdSubReport=True"
    
```

#### Dor Databank Selenium Extract Definitions:

|group|series_type|url|year|dropdown|button|tab|transpose|tableau|pagination|
|--------|-----------|---|----|--------|------|---|---------|-------|----------|
|RevenueExpenses|GeneralFunds|ScheduleA.GeneralFund|islYear|islAmountType|btnSubmit||||xtGenFund-NextPageCaption|
|RevenueExpenses|FedGrants|ScheduleA.Special_Rev_Funds.SpecialRevFunds|islYear|islAmountType|btnSubmit|FedGrants|||xtFedGrants-NextPageCaption|
|RevenueExpenses|StateGrants|ScheduleA.Special_Rev_Funds.SpecialRevFunds|islYear|islAmountType|btnSubmit|StateGrants|||xtStateGrants-NextPageCaption|
|RevenueExpenses|RRA|ScheduleA.Special_Rev_Funds.SpecialRevFunds|islYear|islAmountType|btnSubmit|RRA|||xtRecResApp-NextPageCaption|
|RevenueExpenses|RevFunds|ScheduleA.Special_Rev_Funds.SpecialRevFunds|islYear|islAmountType|btnSubmit|RevFunds|||xtRevFunds-NextPageCaption|
|RevenueExpenses|OtherSpecRev|ScheduleA.Special_Rev_Funds.SpecialRevFunds|islYear|islAmountType|btnSubmit|OtherSpecRev|||xtOtherSpRev-NextPageCaption|
|RevenueExpenses|CapitalFunds|ScheduleA.CapitalProjects.CapitalProjects|islYear|islAmountType|btnSubmit||||xtCapProjects-NextPageCaption|
|RevenueExpenses|TrustFunds|ScheduleA.TrustFunds.TrustFunds|islYear|islAmountType|btnSubmit||||xtTrustFunds-NextPageCaption|
|RevenueExpenses|EnterpriseFunds|ScheduleA.EnterpriseFunds.EnterpriseFunds|islYear|islAmountType|btnSubmit||||xtEntFunds-NextPageCaption|
|RevenueExpenses|HealthInsurance|ScheduleA.HealthInsurance.HealthInsExpenditures|checkbox||||yes||ctHealthExp-NextPageCaption|
|RevenueExpenses|StabFunds|Dashboard.TrendAnalysisReports.StabFund|checkbox||btnAdvisorSubmit||||tblStabilization-NextPageCaption|
|RevenueExpenses|EmployeeWages|ScheduleA.PesonnelExpenditures.PersonnelExpenditures|islYear||btnSubmit|||mixed|ctPerExp-NextPageCaption|
|RevenueExpenses|SnowIce|BalanceSheet.SnowIce|checkbox||btnSubmit||||tblSnowIce-NextPageCaption|
|RevenueExpenses|TaxRecap|TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est|checkbox|||||||
|Debt|MunicipalDebt|https://www.mass.gov/doc/fy2020-fy2022-debt-analysis/download||||||||
|Debt|BondRatings|DLS_bond_ratings|checkbox|islCompany|btnSubmit||yes||xtblBondRatings-NextPageCaption|
|Debt|CertifiedFreeCash|Dashboard.Cat_1_Reports.CertifiedFreeCashBudget351|checkbox||btnSubmit||||TblCFC_PerBudg-NextPageCaption|
|Debt|RetainedEarnings|BalanceSheet.EntFundRetainedEarnings|checkbox||btnSubmit||yes||CrsTblEntFund-NextPageCaption|
|Debt|FreeCashProof|BalanceSheet.FreecashProofComp||islMuniCCP|btnSubmit|||||
|Debt|StabFunds351|Dashboard.Cat_1_Reports.StablPerBudget351|checkbox||btnSubmit||||TblStabl_PerBudg-NextPageCaption|
|PropertyTax|TaxRates|PropertyTaxInformation.taxratesbyclass.taxratesbyclass|checkbox||btnSubmit||||tbl_taxratesbyclass-NextPageCaption|
|PropertyTax|TaxRatesSpecial|Districts.Tax_Rates_by_Class|checkbox||btnSubmit||||tblDistTaxRateClass-NextPageCaption|
|PropertyTax|TaxLevy|Districts.Levy_By_Class_Data|checkbox||btnSubmit||||tbltaxlevybyclassdis-NextPageCaption|
|PropertyTax|NewGrowth|NewGrowth.NewGrowth_dash_v2_test|checkbox||btnSubmit|||mixed|tblNewGrowth-NextPageCaption|
|PropertyTax|OverlayReserve|Dashboard.Cat_1_Reports.OL1PerLevy351|checkbox||btnSubmit||||TblOverlayPerLevy-NextPageCaption|
|PropertyTax|PropertyTax|PropertyTaxInformation.TaxLevies.LeviesByClass|checkbox||btnSubmit||||tblTaxlevybyclass-NextPageCaption|
|PropertyTax|TaxLeviesSpecial|Districts.Levy_By_Class_Data|checkbox||btnSubmit||||tbltaxlevybyclassdis-NextPageCaption|
|PropertyTax|AverageSingleFamilyTaxBill|AverageSingleTaxBill.SingleFamTaxBill_wRange|checkbox||btnSubmit|yes|||tblSinglefamtaxbill-NextPageCaption|
|PropertyTax|AssessedValues|PropertyTaxInformation.AssessedValuesbyClass.assessedvaluesbyclass|checkbox||btnSubmit|yes|||tblassessedvalues-NextPageCaption|
|PropertyTax|AssessedValuesSpecial|Districts.Assessed_Value_By_Class|checkbox||btnSubmit||||tblassessedvalclassdis-NextPageCaption|
|PropertyTax|ExemptValues|LA4.Totals|checkbox|islMuni|btnSubmit|||mixed||
|PropertyTax|EqualizedValuations|PropertyTaxInformation.EQV.EQV|checkbox||btnSubmit|yes|yes|||
|PropertyTax|MotorVehicleExciseTax|TaxRateRecap.PAGE3.Subreports.MV_Act_Est|checkbox||btnSubmit||yes||CxtblMV_Est-NextPageCaption|
|PropertyTax|EstActReceipts|TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est|checkbox||btnSubmit|||||
|PropertyTax|ParcelCounts|PropertyTaxInformation.LA4.Parcel_counts_vals|islYear||btnSubmit|parcel_counts|||xtParcels-NextPageCaption|
|PropertyTax|ParcelValues|PropertyTaxInformation.LA4.Parcel_counts_vals|islYear||btnSubmit|parcel_valuations|||xtVals-NextPageCaption|
|PropertyTax|RevenueSources|RevenueBySource.RBS.RevbySource2|checkbox||btnSubmit||||dtCurrent-NextPageCaption|
|PropertyTax|CIPTaxShift|TaxRate.CIP_TaxShift|checkbox||btnSubmit||||tblCIP_TaxShift-NextPageCaption|
|PropertyTax|Overrides|Votes.Prop2_5.OverrideUnderride|||btnSubmit||||tblProp2_5Votes-NextPageCaption|
|PropertyTax|CapitalExclusion|Votes.Prop2_5.Capital|||btnSubmit||||tblProp2_5Votes-NextPageCaption|
|PropertyTax|DebtExclusion|Votes.Prop2_5.DebtExclusionLevyAmt|||btnSubmit||||TblDebtExcLevyAmt-NextPageCaption|
|PropertyTax|AllDebtExclusion|Votes.Prop2_5.DebtExclusionVotes|||btnSubmit||||tblProp2_5Votes-NextPageCaption|
|PropertyTax|SpecialPurposeStabFund|Votes.Prop2_5.Stabilization|||btnSubmit||||tblProp2_5Votes-NextPageCaption|
|Socioeconomic|DORIncome|DOR_Income_EQV_Per_Capita|islYear||btnAdvisorSubmit||||xtblDOR_Income_EQV_Per_Capita-NextPageCaption|
|Socioeconomic|HousingDensity|Socioeconomic.HousingSqMIle|1999,2009||btnSubmit||||tblHousingSqMile-NextPageCaption|
|Socioeconomic|HouseholdIncome|Socioeconomic.MedHouseholdFamInc|1999||btnSubmit|||||
|Socioeconomic|Population|Socioeconomic.Population.Population|checkbox||btnSubmit||yes||xtblPopulation-NextPageCaption|
|Socioeconomic|CPI|Socioeconomic.consumer.consumerpriceindex|||||yes|||
|Socioeconomic|LaborForce|Dashboard.TrendAnalysisReports.LaborForce|checkbox||btnAdvisorSubmit||||tblLaborForce-NextPageCaption|
|Socioeconomic|MotorVehicles|Socioeconomic.MotorVehicles|checkbox||btnSubmit||||tblMotorVehicle-NextPageCaption|
|Socioeconomic|RegisteredVoters|Socioeconomic.RegisteredVoters|checkbox||btnSubmit||||tblRegVoter-NextPageCaption|
|Socioeconomic|RoadMiles|Socioeconomic.RoadMIles|checkbox||btnSubmit||||tblRoadMiles-NextPageCaption|
|Socioeconomic|ResidentBirths|Number_of_Resident_Births|checkbox||btnSubmit||yes||xt_BirthNumbs-NextPageCaption|
|LocalAidTaxes|CherrySheetsAssessments|CherrySheets.CherrySheet_detail&amp;rdLinkDataLayers=CherrySheets.cherrysheetdetail_main|islYear|islRecChrg|btnBudgetType|||islBudgetType|xtCherrySheet-NextPageCaption|
|LocalAidTaxes|CherrySheetsReceipts|CherrySheets.CherrySheet_detail&amp;rdLinkDataLayers=CherrySheets.cherrysheetdetail_main|islYear|islRecChrg|btnBudgetType|||islBudgetType|xtCherrySheet-NextPageCaption|
|LocalAidTaxes|CherrySheetsDORIncomeEQV|DOR_Income_EQV_Per_Capita|islYear||btnSubmit||||xtblDOR_Income_EQV_Per_Capita-NextPageCaption|
|LocalAidTaxes|LocalMealsTax|Local_Option_Meals_Rooms|checkbox2||btnSubmit||||xt_meals-NextPageCaption|
|LocalAidTaxes|LocalRoomsTax|Local_Option_Meals_Rooms|checkbox2||btnSubmit||||xt_rooms-NextPageCaption|
|LocalAidTaxes|LocalWeedTax|Local_Option_Meals_Rooms|checkbox2||btnSubmit||||xt_ImpactFee-NextPageCaption|


## Database schema

Database tables created:

1. governance.dor_databank
    * ```<dor><year><series><value>```   
2. common.int_value_pairs
    * ```<key><item><value>```
    * ```where int_value_pairs.key = dor_databank_series with item='series'```
    
There are more than 2 million oberservations of financial statement data; almost entirely integer type.  A zscore is computed for each timeseries within a given year across all municipalities.

A single key-value table is used to store all the data with schema:

```sql
        -- Table: governance.dor_databank

        -- DROP TABLE IF EXISTS governance.dor_databank;

        CREATE TABLE IF NOT EXISTS governance.dor_databank
        (
            dor smallint NOT NULL,
            year smallint NOT NULL,
            dor_databank_series smallint NOT NULL,
            value bigint,
            zscore real,
            CONSTRAINT dor_databank_pkey PRIMARY KEY (dor, year, dor_databank_series)
        )

        TABLESPACE pg_default;

        ALTER TABLE IF EXISTS governance.dor_databank
            OWNER to polis     

```

## Databank detail

Below is a brief description of the DOR databank and links to the source data.



## [Schedule A - Revenues and Expenditures](https://www.mass.gov/lists/schedule-a-reports-revenues-expenditures-and-more)

The Schedule A is a statement of revenues, expenditures, and other year-end financial information prepared annually by the local accountant or auditor.

### Revenues

* [General Funds](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=ScheduleA.GenFund_MAIN)

The general fund is used to account for most financial resources and activities governed by the normal town meeting or city council appropriation process.

* [Special Funds](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=ScheduleA.Special_Rev_Funds.SpecialRevFunds)

Funds established by statute only, containing revenues that are earmarked for and restricted expenditures for specific purposes.

* [Capital Funds](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=ScheduleA.CapitalProjects.CapitalProjects)

Capital projects funds and bonding during fiscal year.

* [Trust Funds](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=ScheduleA.TrustFunds.TrustFunds)

Funds for money donated or transferred to a municipality with specific instructions on its use.

* [Enterprise Funds](https://dlsgateway.dor.state.ma.us/reportsrdPage.aspx?rdReport=ScheduleA.EnterpriseFunds.EnterpriseFunds)

An enterprise fund is a separate accounting and financial reporting mechanism for municipal services for which a fee is charged in exchange for goods or services.


### Expenditures

* [Health Insurance](https://dlsgateway.dor.state.ma.us/reportsrdPage.aspx?rdReport=ScheduleA.HealthInsurance.HealthInsExpenditures)

Schedule A, Parts 1 and 6, Health Insurance Expenditures


* [Stabilization funds](https://dlsgateway.dor.state.ma.us/reportsrdPage.aspx?rdReport=Dashboard.TrendAnalysisReports.StabFund)

Total Full Time Equivalent (FTE) Municipal Employees and Wages - Not by Department 

* [Wages](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=ScheduleA.PesonnelExpenditures.PersonnelExpenditures)

Snow and Ice expenditures are allowed to overspend its budget by statute.

* [Snow&Ice](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=BalanceSheet.SnowIce)

### Tax Receipts

Estimated and actual local receipts are reported to DLS annually on the Tax Rate Recapitulation form. The report below compares a community's estimated and actual local receipts for various revenue types. Please note that only estimated (budgeted) receipt data is available for the current fiscal year.

* [Tax Rate Recap Receipts](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est)


## Debt

Key financial indicators and trends related to bond ratings, borrowing and the management of municipal debt.

* [Bond Ratings](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=DLS_bond_ratings)

Bond ratings issued by [Moody's](https://www.mass.gov/media/1676896/download) and [Standard & Poor's](https://www.mass.gov/media/1676891/download).

* [Certified Free Cash](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Dashboard.Cat_1_Reports.CertifiedFreeCashBudget351)

Free cash is a revenue source which results from the calculation, as of July 1, of a community's remaining, unrestricted funds from operations of the previous fiscal year based on the balance sheet as of June 30. Free cash is offset by property tax receivables and certain deficits and can be a negative number.

* [Retained Earnings](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=BalanceSheet.EntFundRetainedEarnings)

Free Cash certified from an enterprise fund is referred to as Retained Earnings. This can be used for purposes including capital improvements, reimbursing the general fund for prior year subsidies or reducing user fees.

* [Free Cash Proof](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=BalanceSheet.FreecashProofComp)

The Free Cash Calculation Report shows the data used to determine the amount of free cash certified as of June 30th of a specific fiscal year.

* [StabFunds](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Dashboard.Cat_1_Reports.StablPerBudget351)

Stabilization is a reserve fund - sometimes referred to as a "rainy day fund" - designed to accumulate funds that can later be appropriated for any lawful purpose.

* [Municipal Debt Analysis](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=https://www.mass.gov/doc/fy2020-fy2022-debt-analysis/download)

Report detailing municipal debt.  Useful terms:


    1. Long term debt consists of Bonds, USDA Rural Development Loans, Serial Notes and Refunding Notes. 

    2. Short term debt consists of [Bond Anticipation Notes](https://www.investopedia.com/terms/b/bondanticipationnote.asp)  (BAN), Federal Aid Anticipation Notes (FAAN), [Revenue Anticipation Notes](https://www.investopedia.com/terms/r/ran.asp) and [State Anticipation Notes (SAAN)](https://www.lawinsider.com/dictionary/state-bond-anticipation-note).

    3. Most long term debt issues range between 5 - 20 years, while short term issues are typically for one year or less. 

    4. A community's debt limit equals 5 percent of the most recent EQV. Prior to FY05, the municipal debt limit was 2.5 percent for cities and 5 percent for towns. The [Municipal Relief Act (Chapter 46, Section 32 of the Acts of 2003))(https://malegislature.gov/Laws/SessionLaws/Acts/2003/Chapter46) changed the debt limit to 5 percent for all cities and towns.

    5. The long term retired column refers to bond issues that have either matured or been "called in." The long term interest, short term interest and other interest columns refer to interest payments made this year on bond issues.

    6. Total Outstanding Debt refers to the remaining principal payments that have not been paid off as of July 1 of the current fiscal year. 


## Property Tax

Tax rates, assessed values, levies, Proposition 2 1/2 referendum votes and other data related to property taxes

* [TaxRatesMuni](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=PropertyTaxInformation.taxratesbyclass.taxratesbyclass_main)

City and town tax rates from FY 2003 to the present. Depending on the options chosen each year by the select board or city council, property classes can be taxed using a single rate or different rates. 


* [TaxRatesSpecial](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Districts.Tax_Rates_by_Class)

* [TaxLevy](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Districts.Levy_By_Class)

* [NewGrowth](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=NewGrowth.NewGrowth_dash_v2_test)

* [OverlayReserve](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Dashboard.Cat_1_Reports.OL1PerLevy351)

* [ExcessOverrideCapacity1](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Prop2.5.ExcessLevyCapandOverride_MAIN)

* [ExcessOverrideCapacity2](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Prop2.5.ExcessLevyCapandOverride_03_09)

### Property Tax Levies and Average Single Family Tax Bills 

* [PropertyTax](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Dashboard.TrendAnalysisReports.TaxLevyByClass)

Property tax levies by the five major property classes (residential, open space, commercial, industrial and personal property) as reported on page 1 of the annual tax rate recapitulation sheet for municipalities.


* [TaxLeviesSpecial](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Districts.Levy_By_Class)

* [AverageSingleFamilyTaxBill](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=AverageSingleTaxBill.SingleFamTaxBill_wRange)

Average single family tax bill for all communities and ranks those bills by fiscal year from the highest to lowest in the Commonwealth.

### Assessed Property Values

* [AssessesValues](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=PropertyTaxInformation.AssessedValuesbyClass.assessedvaluesbyclass)

Assessed total property values by residential, open space, commercial, industrial and personal property types for cities and town as reported by the local board of assessors through the DLS Gateway application on the LA-4 form.


* [AssessedValuesSpecial](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Districts.Assessed_Value_By_Class)

Assessed total property values by residential, open space, commercial, industrial and personal property types for cities and town as reported by the **special purpose taxing districts** assessors through the DLS Gateway application on the LA-4 form.

* [ExemptValues](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=LA4.Totals)

### Other Property Tax Related Reports

* [EqualizedValuations](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=PropertyTaxInformation.EQV.EQV)
* [MotorVehicleExciseTax](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=TaxRateRecap.PAGE3.Subreports.MV_Act_Est)
* [EstActReceipts](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est)
* [ParcelCountsValues](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=PropertyTaxInformation.LA4.Parcel_counts_vals)
* [ParcelCountsValuesSpecial](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Districts.parcel_count_by_type)
* [RevenueSources](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=RevenueBySource.RBS.RevbySourceMAIN)
* [CIPTaxShift](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=TaxRate.CIP_TaxShift)

### Proposition 2 1/2 Referendum Data

* [Overrides](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Votes.Prop2_5.OverrideUnderride)
* [CapitalExclusion](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Votes.Prop2_5.Capital)
* [DebtExclusion](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Votes.Prop2_5.DebtExclusionLevyAmt)
* [AllDebtExclusion](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Votes.Prop2_5.DebtExclusionVotes)
* [SpecialPurposeStabFund](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Votes.Prop2_5.Stabilization)


## Socioeconomic Data - Income, Population and Housing Data

Select socioeconomic data from the department of revenue (DOR), Mass department of transportation (Mass DOT), secretary of state's office, department of public health (DPH), department of work force and labor (DWFL) and the US census bureau. Key economic condition indicators for cities, towns and counties.

Data relative to income, population and housing from both the DOR and the US Census Bureau.  Income is presented in the Income, EQV and Population report which is used in the formula allocation of certain cherry sheet programs and the calculation of school building assistance rates.  Other income, population and housing data relates to the US Census surveys.

* [DORIncome](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=DOR_Income_EQV_Per_Capita)
* [HousingDensity](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.HousingSqMIle)
* [MedHouseholdIncome](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.MedHouseholdFamInc)
* [Population](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.Population.population_main)
* [CPI](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.consumer.consumerpriceindex_main)
* [LaborForce](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Dashboard.TrendAnalysisReports.LaborForce)
* [MotorVehicles](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.MotorVehicles)
* [RegisteredVoters](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.RegisteredVoters)
* [RoadMiles](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Socioeconomic.RoadMIles)
* [ResidentBirths](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Number_of_Resident_Births)


## Local Aid/Cherry Sheets

Cherry sheet estimates, monthly local aid and other payments managed by DLS and Municipal Revenue Growth Factors (MRGF) data

    * Annually the Commissioner of Revenue must provide cherry sheet estimates, which are the best estimate of the amount of state aid and assessments.  Boards of assessors are required to use these estimates in determining their local budgets.

    * Monthly local aid payments, CPA state match, smart growth school cost reimbursement and property tax exemption reimbursements.

    * Municipal revenue growth factors (MRGFs) are a component used by the Department of Elementary and Secondary Education in determining the annual allocation of the Chapter 70 aid cherry sheet program.


* [CherrySheets](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=CherrySheets.cherrysheetdetail_main)

Estimated state aid to be received and assessments due by community, or by year and program.

* [CherrySheetsDORIncomeEQV](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=DOR_Income_EQV_Per_Capita)

* [LocalTaxes](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Local_Option_Meals_Rooms)

Monthly local aid and other payments managed by the Division of Local Services

* [CPAAdopt](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Local_Option_CPA)
* [CPA](https://www.mass.gov/doc/fy2024-community-preservation-act-state-match/download)


## Local Taxes
* [LocalOptions](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=LocalOptions.localoptions)

Municipalities may adopt certain local option statutes that will impact the assessment of local property taxes and appear in the Local Options report. 

* [MarijuanaTaxAdoption](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=LocalOptions.Local_Options_Tax)
* [LocalTaxes](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=Local_Option_Meals_Rooms)
* [RoomTaxAdoption](https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport=LocalOptions.Room_Tax_Impact_Fee)



#### Issues

1. Enterprise Funds extract; aspx report has bug where years select dropdown disappears later years...
    * fix by sorting years in reverse ...
    * ... then selecting on type (Revenues in place of Expenditures) also had just a single year showing.
    * second fix; run expenditures and revenues separately.
2. EmployeeWages next page icon refreshes current page
    * fix by comparing tables in current and previous frame and exiting when they are the same
3. CherrySheets
    * "Final Budget" not default - manually set
    * submit button has different id
    * no fiscal year in columns - added during transform
4. Parcels next page and tab selection collide
    * in while loop for pagination insert click on tab "Parcel Valuations"
5. RevenueBy Source
    * RevenueSources uses checkbox handler id = iclYear2_handler
    * messed up data, only extracted first tab (2003 - current)
6. LaborForce missing DOR Code
    * only available from 2012 to current; older version from 1990
7. Registered Voters stale as of 2012
8. Override/Debt Exclusion votes
    * table header uses links; use table css id to isolate table of interest
    * same for Population
9. HealthInsurance has no checkbox for additional years


### Set-Up

In [None]:
url_prefix = "https://dlsgateway.dor.state.ma.us/reports/rdPage.aspx?rdReport="
url_suffix = "&rdSubReport=True"

databank = {
    "RevenueExpenses": {
        "GeneralFunds"     :  "ScheduleA.GeneralFund",
        "SpecialFunds"     :  "ScheduleA.Special_Rev_Funds.SpecialRevFunds",
        "CapitalFunds"     :  "ScheduleA.CapitalProjects.CapitalProjects",
        "TrustFunds"       :  "ScheduleA.TrustFunds.TrustFunds",
        "EnterpriseFunds"  :  "ScheduleA.EnterpriseFunds.EnterpriseFunds",
        "HealthInsurance"  :  "ScheduleA.HealthInsurance.HealthInsExpenditures",
        "StabFunds"        :  "Dashboard.TrendAnalysisReports.StabFund",
        "EmployeeWages"    :  "ScheduleA.PesonnelExpenditures.PersonnelExpenditures",
        "SnowIce"          :  "BalanceSheet.SnowIce",
        "TaxRecap"         :  "TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est",
    },
    "Debt" : {
        "BondRatings"     :  "DLS_bond_ratings",
        "CertifiedFreeCash":  "Dashboard.Cat_1_Reports.CertifiedFreeCashBudget351",
        "RetainedEarnings" :  "BalanceSheet.EntFundRetainedEarnings",
        "FreeCashProof"    :  "BalanceSheet.FreecashProofComp",
        "StabFunds351"     :  "Dashboard.Cat_1_Reports.StablPerBudget351",
        "MunicipalDebtAnalysis" : "",
    },
    "PropertyTax" : {
        "TaxRates":"PropertyTaxInformation.taxratesbyclass.taxratesbyclass_main",
        "TaxRatesSpecial":"Districts.Tax_Rates_by_Class",
        "TaxLevy":"Districts.Levy_By_Class",
        "NewGrowth":"NewGrowth.NewGrowth_dash_v2_test",
        "OverlayReserve":"Dashboard.Cat_1_Reports.OL1PerLevy351",
        #"ExcessOverrideCapacity1":"Prop2.5.ExcessLevyCapandOverride_MAIN",
        #"ExcessOverrideCapacity2":"Prop2.5.ExcessLevyCapandOverride_03_09",
        "PropertyTax":"Dashboard.TrendAnalysisReports.TaxLevyByClass",
        "TaxLeviesSpecial":"Districts.Levy_By_Class",
        "AverageSingleFamilyTaxBill":"AverageSingleTaxBill.SingleFamTaxBill_wRange",
        "AssessesValues":"PropertyTaxInformation.AssessedValuesbyClass.assessedvaluesbyclass",
        "AssessedValuesSpecial":"Districts.Assessed_Value_By_Class",
        "ExemptValues":"LA4.Totals",
        "EqualizedValuations":"PropertyTaxInformation.EQV.EQV",
        "MotorVehicleExciseTax":"TaxRateRecap.PAGE3.Subreports.MV_Act_Est",
        "EstActReceipts":"TaxRateRecap.PAGE3.LocalReceiptsAct_vs_Est",
        "ParcelCountsValues":"PropertyTaxInformation.LA4.Parcel_counts_vals",
        #"ParcelCountsValuesSpecial":"Districts.parcel_count_by_type",
        "RevenueSources":"RevenueBySource.RBS.RevbySourceMAIN",
        "CIPTaxShift":"TaxRate.CIP_TaxShift",
        "Overrides":"Votes.Prop2_5.OverrideUnderride",
        "CapitalExclusion":"Votes.Prop2_5.Capital",
        "DebtExclusion":"Votes.Prop2_5.DebtExclusionLevyAmt",
        "AllDebtExclusion":"Votes.Prop2_5.DebtExclusionVotes",
        "SpecialPurposeStabFund":"Votes.Prop2_5.Stabilization",
    },
    "Socioeconomic" : {
        "DORIncome"      :  "DOR_Income_EQV_Per_Capita",
        "HousingDensity" :  "Socioeconomic.HousingSqMIle",
        "HouseholdIncome":  "Socioeconomic.MedHouseholdFamInc",
        "Population"     :  "Socioeconomic.Population.population_main",
        "CPI"            :  "Socioeconomic.consumer.consumerpriceindex_main",
        "LaborForce"     :  "Dashboard.TrendAnalysisReports.LaborForce",
        "MotorVehicles"  :  "Socioeconomic.MotorVehicles",
        "RegisteredVoters" :  "Socioeconomic.RegisteredVoters",
        "RoadMiles"      :  "Socioeconomic.RoadMIles",
        "ResidentBirths" :  "Number_of_Resident_Births",
    },
    "LocalAidTaxes" : {
        "CherrySheets":"CherrySheets.cherrysheetdetail_main",
        "CherrySheetsDORIncomeEQV":"DOR_Income_EQV_Per_Capita",
        "LocalTaxes":"Local_Option_Meals_Rooms",
        "CPA":"",
    },
}
  

In [None]:
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv (
        find_dotenv (
            usecwd=True
        ),
    override=True
) # read local .env file and override any existing

from sqlalchemy import create_engine
from os import environ

username     =  environ.get("POSTGRES_USERNAME", "postgres")
password     =  environ.get("POSTGRES_PASSWORD", "postgres")
ipaddress    =  environ.get("POSTGRES_IPADDRESS", "localhost")
port         =  environ.get("POSTGRES_PORT", "5432")
dbname       =  environ.get("POSTGRES_DBNAME", "ArlingtonMA")

#establish database connection for Transform queries and Loads
cnx= create_engine(f'postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}')

In [None]:
import pandas as pd
import numpy as np
import time
import shutil


from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException


### Extract

In [None]:
def start_up(url, headless=False):
    from datetime import datetime
    from selenium import webdriver
    from selenium.webdriver.firefox.options import Options
    from selenium.webdriver.firefox.service import Service

    options = Options()
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--start-maximized")

    if headless==True:
        options.add_argument("--headless")
    
    geckodriver_path = "/snap/bin/geckodriver"  # specify the path to your geckodriver
    driver_service = Service(executable_path=geckodriver_path)
    
    driver = webdriver.Firefox(options=options, service=driver_service)

    driver.get(url)

    return driver


def check_exists_by_id(driver,id):
    try:
        driver.find_element(By.ID,id)
    except NoSuchElementException:
        return False
    return True

#//*[@id="tblSinglefamtaxbill-NextPageCaption"]
# def combine_pages_SFTB(driver):
#     id = "tblSinglefamtaxbill-NextPageCaption"
#     all_df=pd.DataFrame()
#     while True:
#         time.sleep(1)

#         df = pd.read_html(driver.page_source,
#                           match="Single Family Values",
#                           skiprows=9)[0]        
#         all_df = pd.concat([all_df,df])
#         if check_exists_by_id(driver,id)==True:
#             driver.find_element(By.ID,id).click()
#             time.sleep(1)
#         else:
#             break
#     return all_df

# def combine_pages_Parcels(
#     driver,
#     id = "xtParcels-NextPageCaption",
#     table_column = "Single Family 101",
#     skiprows = 4
# ):
#     all_df=pd.DataFrame()
#     while True:
#         time.sleep(1)

#         df = pd.read_html(driver.page_source,
#                           match=table_column,
#                           skiprows=skiprows)[0]        
#         all_df = pd.concat([all_df,df])
#         if check_exists_by_id(driver,id)==True:
#             driver.find_element(By.ID,id).click()
#             time.sleep(1)
#         else:
#             break
#     return all_df


# def combine_pages_old(driver):
#     id = "xtGenFund-NextPageCaption"
#     all_df=pd.DataFrame()
#     while True:
#         df = pd.read_html(driver.page_source)[3].iloc[6:]
#         all_df = pd.concat([all_df,df])
#         if check_exists_by_id(driver,id)==True:
#             driver.find_element(By.ID,id).click()
#             time.sleep(1)
#         else:
#             break
#     return all_df

def combine_pages(driver,id):
    all_df=pd.DataFrame()
    while True:
        df = pd.read_html(driver.page_source)[3].iloc[6:]
        all_df = pd.concat([all_df,df])
        if check_exists_by_id(driver,id)==True:
            driver.find_element(By.ID,id).click()
            time.sleep(1)
        else:
            break
    return all_df

def combine_pages_all(driver,id):
    all_df=pd.DataFrame()
    prev_df = pd.DataFrame()
    table_id = id.replace("-NextPageCaption","")
    print(id, table_id)

    while True:

        ## General Fund
        element = WebDriverWait(driver,10)\
                     .until(EC.presence_of_element_located((By.ID, table_id)))
        if table_id == 'xtVals':
            ## hack for multitab parcels valuations, include wait on parcel_valuations above as well.
            driver.find_element(By.ID,"parcel_valuations").click()

        try:
            df = pd.read_html(driver.page_source,match="DOR Code")
        except:
            try:
                df = pd.read_html(driver.page_source,match="DOR CODE")
            except:
                break
        
        if len(df)==1:
            df=df[0]
        else:
            df=df[1]
            
        ## fix for EmployeeWages where next page id continues to exist after large page selected
        if df.equals(prev_df):
            break

        all_df = pd.concat([all_df,df])
        prev_df = df
        
            
        if check_exists_by_id(driver,id)==True:
            driver.find_element(By.ID,id).click()
            time.sleep(1)
        else:
            break
            
    return all_df


def checkbox_all(driver,button='btnSubmit'):
    
    driver.find_element(By.ID,'iclYear_handler').click() ##open dropdown
    driver.find_element(By.ID,'iclYear_check_all').click() ##select check all
    driver.find_element(By.ID,'iclYear_handler').click() ##collapse dropdown
    driver.find_element(By.ID,button).click()
    time.sleep(5)  ##36 or more years takes some time

    
# def select_checkbox_series(driver,button,id_pagination):
    
#     checkbox_all(driver,button=button)
    
#     df = combine_pages_all(driver,id_pagination)
#     driver.close()
    
#     return df
    

In [None]:
# def metrics(df,series_type,cnx,series_metric=None):
#     if 'series_metric' in df.columns:
        
#         query = """
#             select key,value from common.int_value_pairs where item='series_metric'
#             and key in (select distinct series_metric from governance.financials 
#             where series_type = {series_type});
#         """.format(series_type=series_type)
    
#         int_value_pairs = pd.read_sql_query(query,cnx)
#         int_value_pairs['value']=int_value_pairs['value'].str.replace('general_fund_','')
#         int_value_pairs['value']=int_value_pairs['value'].str.replace('*','',regex=False)
    
    
#         df=df.merge(int_value_pairs,right_on='value',left_on='series_metric',how='left')
#         assert(pd.isnull(df['key']).any()==False)

#         df = df\
#             .drop(['series_metric','value_y'],axis=1)\
#             .rename(columns={'key':'series_metric','value_x':'value'})\
#             [['dor','year','series_metric','value']]\
#             .sort_values(['dor','series_metric','year'])\
#             .reset_index(drop=True)
#     else:
#         df['series_metric']=series_metric
        
    
#     df['series_type']=series_type

#     from scipy.stats import zscore
#     from scipy.stats.mstats import winsorize

#     winsorize(df['value'], limits=[0.05, 0.05])

#     df['zscore']=0

#     mask = df['value'].apply(type).isin([int,float])
#     df.loc[mask,'zscore']=df[mask].groupby(['series_metric','year'])['value']\
#         .transform(lambda x : zscore(winsorize(x, limits=[0.05, 0.05]).astype(float),ddof=0))
    
#     return df

In [None]:
# def extract_general_fund(url,id_pagination = "xtGenFund-NextPageCaption", CUT_OFF_YEAR=2021):

#     driver = start_up(url)
#     time.sleep(2)

#     id = 'islAmountType'
#     amount_types=[]
#     for elem in Select(driver.find_element(By.ID,'islAmountType')).options:
#         amount_types.append(elem.get_attribute("value"))

#     id = 'islYear'
#     years=[]
#     for elem in Select(driver.find_element(By.ID,id)).options:
#         years.append(elem.get_attribute("value"))

#     years = [x for x in years if x >= str(CUT_OFF_YEAR)]

#     data = {}
#     for amount_type in amount_types:
#         data[amount_type]=pd.DataFrame()
#         select = Select(driver.find_element(By.ID,'islAmountType'))
#         select.select_by_visible_text(amount_type)
#         for year in years:
#             select = Select(driver.find_element(By.ID,'islYear'))
#             select.select_by_visible_text(year)
#             #time.sleep(1)

#             element = WebDriverWait(driver,10)\
#                          .until(EC.presence_of_element_located((By.ID, id)))
#             driver.find_element(By.ID,'btnSubmit').click()
#             time.sleep(5)
#             tmp_df = combine_pages(driver,id_pagination)
#             data[amount_type] = pd.concat([data[amount_type],tmp_df])

#     driver.close()

#     return data

def extract_fund(
    url,
    id_pagination = "xtGenFund-NextPageCaption",
    id_dropdown = "islAmountType",
    tab = "",
    CUT_OFF_YEAR=2021
):

    driver = start_up(url)
    time.sleep(15)

    amount_types=[]
    
    try:
        for elem in Select(driver.find_element(By.ID,id_dropdown)).options:
            amount_types.append(elem.get_attribute("value"))
    except:
        pass

    print('amount_types',amount_types)

    id = 'islYear'
    years=[]
    for elem in Select(driver.find_element(By.ID,id)).options:
        years.append(elem.get_attribute("value"))
        
    years.sort()
    years = years[::-1]
    years = [x for x in years if x >= str(CUT_OFF_YEAR)]
    print('years',years)

    data = {}
    try:
        if len(amount_types)>0:
            for amount_type in amount_types:
                data[amount_type]=pd.DataFrame()
                select = Select(driver.find_element(By.ID,id_dropdown))
                select.select_by_visible_text(amount_type)
                if len(tab)>0:
                    driver.find_element(By.ID,tab).click()
                    time.sleep(1)
                for year in years:
                    element = WebDriverWait(driver,10)\
                                 .until(EC.presence_of_element_located((By.ID, id)))
                    select = Select(driver.find_element(By.ID,'islYear'))
                    select.select_by_visible_text(year)
                    try:
                        select = Select(driver.find_element(By.ID,'islBudgetType'))
                        select.select_by_visible_text("Final Budget")
                    except:
                        pass
    
                    try:
                        driver.find_element(By.ID,'btnSubmit').click()
                    except:
                        try:
                            driver.find_element(By.ID,"btnAdvisorSubmit").click()
                        except:
                            print("click button?")
                            driver.find_element(By.ID,"btnBudgetType").click()
                            time.sleep(20)
    
                    time.sleep(5)
    
                    tmp_df = combine_pages_all(driver,id_pagination)
                    tmp_df['year']=year
                    data[amount_type] = pd.concat([data[amount_type],tmp_df])
        else:
            data['Expenditures'] = pd.DataFrame()
            for year in years:
                element = WebDriverWait(driver,10)\
                             .until(EC.presence_of_element_located((By.ID, id)))
                select = Select(driver.find_element(By.ID,'islYear'))
                select.select_by_visible_text(year)
    
                try:
                    driver.find_element(By.ID,'btnSubmit').click()
                except:
                    try:
                        driver.find_element(By.ID,"btnAdvisorSubmit").click()
                    except:
                        driver.find_element(By.ID,"btnBudgetType").click()
                time.sleep(5)
                if len(tab)>0:
                    driver.find_element(By.ID,tab).click()
                    time.sleep(1)
                tmp_df = combine_pages_all(driver,id_pagination)
                data['Expenditures'] = pd.concat([data['Expenditures'],tmp_df])

    except:
        print('')
        pass
        
    driver.close()

    return data


def extract_funds(all_data):
    ddf = pd.read_csv('/data/municipalities/MA/data/databank_defintions.tsv',sep='\t',dtype=str).replace(np.nan,'')
    
    # 'CherrySheets','CherrySheetsDORIncomeEQV', 'DORIncome', 'EnterpriseFunds',
    categories = ['GeneralFunds', 'CapitalFunds', 'TrustFunds', 
                  'CherrySheetsAssessments',
                  # 'CherrySheetsDORIncomeEQV', 'DORIncome', 'EnterpriseFunds', 
                  'EmployeeWages', 
                  'FedGrants', 'StateGrants', 'RRA', 'RevFunds', 'OtherSpecRev', 
                  'ParcelCounts','ParcelValues']
    for series_type in categories:
    
        url = ddf[ddf.series_type==series_type].url.values[0]
        url = url_prefix + url + url_suffix
    
        pagination = ddf[ddf.series_type==series_type].pagination.values[0]
        tab        = ddf[ddf.series_type==series_type].tab.values[0]
        id_dropdown= ddf[ddf.series_type==series_type].dropdown.values[0]
    
        all_data[series_type] = extract_fund(
            url,
            id_pagination=pagination,
            id_dropdown =id_dropdown,
            tab=tab
        )

    return all_data
    
## NB accepts default years in checkbox (prior 5 years) 
def extract_series(all_data):
    ddf = pd.read_csv('/data/municipalities/MA/data/databank_defintions.tsv',sep='\t',dtype=str).replace(np.nan,'')
    
    categories = ['TaxRecap','StabFunds','SnowIce','CertifiedFreeCash','StabFunds351','TaxRates','TaxRatesSpecial',
                 'TaxLevy','NewGrowth','OverlayReserve','PropertyTax','AverageSingleFamilyTaxBill',
                 'AssessedValues','AssessedValuesSpecial','CIPTaxShift','RevenueSources',
                 'MotorVehicles','RegisteredVoters','RoadMiles']
    
    for series_type in categories:
        url = ddf[ddf.series_type==series_type].url.values[0]
        url = url_prefix + url + url_suffix
        
        id_pagination = ddf[ddf.series_type==series_type].pagination.values[0]
        tab        = ddf[ddf.series_type==series_type].tab.values[0]
        button     = ddf[ddf.series_type==series_type].button.values[0]
        id_dropdown= ddf[ddf.series_type==series_type].dropdown.values[0]
        
        driver = start_up(url)
        time.sleep(15)
    
        all_data[series_type] = combine_pages_all(driver,id_pagination)
    
        driver.close()

    return all_data



def extract_votes(all_data):

    ddf = pd.read_csv('/data/municipalities/MA/data/databank_defintions.tsv',sep='\t',dtype=str).replace(np.nan,'')
    categories = ['Overrides','CapitalExclusion','DebtExclusion','AllDebtExclusion','SpecialPurposeStabFund']
    
    # series_type ='SpecialPurposeStabFund'
    for series_type in categories:
        url = ddf[ddf.series_type==series_type].url.values[0]
        url = url_prefix + url + url_suffix
        
        pagination = ddf[ddf.series_type==series_type].pagination.values[0]
        tab        = ddf[ddf.series_type==series_type].tab.values[0]
        id_dropdown= ddf[ddf.series_type==series_type].dropdown.values[0]
        button     = ddf[ddf.series_type==series_type].button.values[0]
        
        driver = start_up(url)
        time.sleep(10)
        checkbox_all(driver,button=button)
        
        all_data[series_type] = combine_pages_all(driver,pagination)
        driver.close()

    return all_data


def extract_socioeconomic(all_data):

    ddf = pd.read_csv('/data/municipalities/MA/data/databank_defintions.tsv',sep='\t',dtype=str).replace(np.nan,'')
    categories = ['Population','CPI','LaborForce','ResidentBirths','HealthInsurance','BondRatings',
                 'RetainedEarnings']
    
    for series_type in ['BondRatings']:#categories:
        url = ddf[ddf.series_type==series_type].url.values[0]
        url = url_prefix + url + url_suffix
        
        pagination = ddf[ddf.series_type==series_type].pagination.values[0]
        tab        = ddf[ddf.series_type==series_type].tab.values[0]
        id_dropdown= ddf[ddf.series_type==series_type].dropdown.values[0]
        button     = ddf[ddf.series_type==series_type].button.values[0]
        
        driver = start_up(url)
        time.sleep(2)
        
        if series_type == 'CPI':
            df = pd.read_html(driver.page_source)
            all_data['CPI']=pd.merge(df[2], df[3], left_index=True, right_index=True)
            # driver.close()
        else:
            all_data[series_type] = combine_pages_all(driver,pagination)
    
        if series_type == 'BondRatings':
    
            select = Select(driver.find_element(By.ID,id_dropdown))
            select.select_by_visible_text("S&P")
            driver.find_element(By.ID,'btnSubmit').click()
            sp = combine_pages_all(driver,pagination)
            all_data[series_type] = pd.concat([all_data[series_type],sp])
            all_data[series_type]=all_data[series_type].rename(columns={'DOR CODE':'DOR Code'})
            
    return all_data

def extract_local_taxes(all_data):
    
    def extract_localTaxes(driver):

        driver.find_element(By.ID,"rdCaption_MealTax").click()
    
        meals = pd.DataFrame()
        pagination="xt_meals-NextPageCaption"
    
        elements=driver.find_elements(By.NAME,"iclYearMeals")
        print(len(elements))
        idx = 1
        for elem in elements:
            if idx<=4:#len(elements):
                driver.find_element(By.ID,'iclYearMeals_handler').click() ##open dropdown
                driver.find_element(By.ID,'iclYearMeals_rdList'+str(idx)).click()
                driver.find_element(By.ID,'iclYearMeals_handler').click() ##open dropdown
                driver.find_element(By.ID,button).click()
                idx+=1
                tmp_df = combine_pages_all(driver,pagination)
                meals = pd.concat([meals,tmp_df])
    
        driver.find_element(By.ID,"rdCaption_Rooms").click()
    
        rooms = pd.DataFrame()
        pagination="xt_rooms-NextPageCaption"
        
        elements=driver.find_elements(By.NAME,"iclYearRooms")
        print(len(elements))
        idx = 1
        for elem in elements:
            if idx<=4:#len(elements):
                driver.find_element(By.ID,'iclYearRooms_handler').click() ##open dropdown
                driver.find_element(By.ID,'iclYearRooms_rdList'+str(idx)).click()
                year = driver.find_element(By.ID,'iclYearRooms_rdList'+str(idx)).get_attribute('value')
                print(year)
                driver.find_element(By.ID,'iclYearRooms_handler').click() ##open dropdown
                driver.find_element(By.ID,button).click()
                idx+=1
                tmp_df = combine_pages_all(driver,pagination)
                tmp_df['Fiscal Year']=year
                rooms = pd.concat([rooms,tmp_df])
    
        weed = pd.DataFrame()
        pagination="xt_ImpactFee-NextPageCaption"
    
        driver.find_element(By.ID,"rdCaption_ImpactFee").click()
    
        elements=driver.find_elements(By.NAME,"iclYearImp")
        print(len(elements))
        idx = 1
        for elem in elements:
            if idx<=4:#len(elements):
                driver.find_element(By.ID,'iclYearImp_handler').click() ##open dropdown
    
                driver.find_element(By.ID,'iclYearImp_rdList'+str(idx)).click()
                year = driver.find_element(By.ID,'iclYearImp_rdList'+str(idx)).get_attribute('value')
                print(year)
                driver.find_element(By.ID,'iclYearImp_handler').click() ##open dropdown
                driver.find_element(By.ID,button).click()
                idx+=1
                tmp_df = combine_pages_all(driver,pagination)
                tmp_df['Fiscal Year']=year
                weed = pd.concat([weed,tmp_df])
    
        ##
        df1 = meals[meals['DOR Code'].apply(len)==3].reset_index(drop=True).copy()
        df1['local_taxes_type']='meals'
        df2 = rooms[rooms['DOR Code'].apply(len)==3].reset_index(drop=True).copy()
        df2['local_taxes_type']='rooms'
        df3 = weed[weed['DOR Code'].apply(len)==3].reset_index(drop=True).copy()
        df3['local_taxes_type']='weed'
        localTaxes = pd.concat([df1,df2,df3]).sort_values(['Fiscal Year','DOR Code']).reset_index(drop=True)
    
        return localTaxes

    
    ddf = pd.read_csv('/data/municipalities/MA/data/databank_defintions.tsv',sep='\t',dtype=str).replace(np.nan,'')
    
    for series_type in ['LocalTaxes']:
        url = ddf[ddf.series_type==series_type].url.values[0]
        url = url_prefix + url + url_suffix
        
        pagination = ddf[ddf.series_type==series_type].pagination.values[0]
        tab        = ddf[ddf.series_type==series_type].tab.values[0]
        id_dropdown= ddf[ddf.series_type==series_type].dropdown.values[0]
        button     = ddf[ddf.series_type==series_type].button.values[0]
        
        driver = start_up(url)
        time.sleep(2)
    
    all_data['LocalTaxes']= extract_localTaxes(driver)

    driver.close()

    return all_data

### Transform

In [None]:
def transform_rds(
    df, 
    floats = ['Debt Service as % of Budget','Debt as % of EQV']
):
    df = df.astype(object)
    df = df[~pd.isnull(df['DOR Code'])]
    if int not in df['DOR Code'].apply(type).unique():
        df = df[df['DOR Code'].apply(len)==3]
    
    if 'County' in df.columns:
        df = df.drop('County',axis=1)
    
    df=df.drop('Municipality',axis=1).rename(columns={
        'DOR Code':'dor',
        'FY':'year',
        'Fiscal Year':'year',
    })
    if 'year' in df.columns:
        df=df\
            . melt(['dor','year'],var_name='series_metric')
    else:
        df=df\
            . melt(['dor'],var_name='year')
    
    df=df[~pd.isnull(df['value']) & (df['value']!='____________')]
    
    
    for col in ['dor','year']:
        df[col]=df[col].astype(int)

    if len(floats)>0:
        mask = df.series_metric.isin(floats)
        df.loc[~mask,'value']  = df.loc[~mask,'value'].astype(int)
        df.loc[mask,'value']   = round(100*df.loc[mask,'value'].astype(float),2).astype(int)
    else:
        df['value'] = df['value'].astype(int)
        
    #if 'series_metric' in df.columns:
    #df = metrics(df,series_type,cnx,series_metric)
    
    return df


def transform_revenues_expenditures(data):

    drop_cols = ['Fiscal Year','Municipality','County','LEA Code']
    if 'Revenues' in data:

        revenues      =  data['Revenues']
        for col in drop_cols:
            if col in revenues.columns:
                revenues      =  revenues.drop(col,axis=1)
        mask = revenues['DOR Code'].apply(type)==int
        if mask.any():
            revenues = revenues[mask]
        else:
            mask = revenues['DOR Code'].apply(len)==3
            revenues = revenues[mask]
    
    if 'Expenditures' in data:

        if 'year' not in data['Expenditures'].columns:
            data['Expenditures']['year'] = data['Expenditures']['Fiscal Year']

        expenditures =  data['Expenditures']
        for col in drop_cols:
            if col in expenditures.columns:
                expenditures      =  expenditures.drop(col,axis=1)

        mask = expenditures['DOR Code'].apply(type)==int
        if mask.any():
            expenditures = expenditures[mask]
        else:
            mask = expenditures['DOR Code'].apply(len)==3
            expenditures = expenditures[mask]
        
    if 'Revenues' in data:
        temp_df = revenues.merge(expenditures,on=['DOR Code','year'],how='outer',indicator='matched')
        assert((temp_df.matched=='both').all())
        temp_df=temp_df.drop('matched',axis=1)
    else:
        temp_df = expenditures
        
    temp_df=temp_df.rename(columns={
        'DOR Code':'dor',
    })
    
    
    temp_df = temp_df.melt(['dor','year'],var_name='series_metric')
    
    for col in ['dor','year','value']:
        try:
            temp_df[col]=temp_df[col].fillna(0).astype(int)
        except:
            print('column not all int',col)
    
    return temp_df, expenditures


def revenues_expenditures(all_data):

    ## problems with 'CherrySheetsDORIncomeEQV', 'DORIncome', 'EnterpriseFunds',
    transformed_data = pd.DataFrame()
    
    # series_types =  = ['GeneralFunds', 'CapitalFunds', 'TrustFunds', 
    #                    'CherrySheetsDORIncomeEQV', 'DORIncome', 'EnterpriseFunds', 
    #                    'EmployeeWages', 'FedGrants', 'StateGrants', 'RRA', 'RevFunds', 
    #                    'OtherSpecRev', 'ParcelCounts', 'ParcelValues']
    
    series_types = ['GeneralFunds', 'CapitalFunds', 'TrustFunds', 
     'EmployeeWages', 'FedGrants', 'StateGrants', 
     'RRA', 'RevFunds', 'OtherSpecRev', 'ParcelCounts', 'ParcelValues']
    for series_type in series_types:
        tmp_df, foo = transform_revenues_expenditures(all_data[series_type])
        tmp_df['series_type']=series_type
        tmp_df.series_metric=tmp_df.series_metric\
            .str.replace('_x',' Revenues')\
            .str.replace('_y',' Expenditures')
    
        assert((tmp_df.value.apply(type)==int).all())   
        transformed_data=pd.concat([transformed_data,tmp_df])
    
    return transformed_data
    
def transform_TaxRecap(df):
    df = df.drop(['Receipt Type ID','Municipality'],axis=1)
    mask = df['DOR Code'].apply(len)==3
    df = df[mask]
    df = df.rename(columns={'DOR Code':'dor','Fiscal Year':'year',
                           'Receipt Description':'series_metric'})

    #df = df . melt(['dor','year','Receipt Description'],var_name='value')

    df1 = df[['dor','year','series_metric','Estimate']].copy()
    df1['series_metric']=df1['series_metric'] + ' Estimate'
    df1.columns = ['dor','year','series_metric','value']

    df2 = df[['dor','year','series_metric','Actual']].copy()
    df2['series_metric']=df2['series_metric'] + ' Actual'
    df2.columns = ['dor','year','series_metric','value']

    df = pd.concat([df1,df2]).sort_values(['dor','year','series_metric']).reset_index(drop=True)
    df = df[~pd.isnull(df['value'])]
    df['dor']=df['dor'].astype(int)
    df['value']=df['value'].astype(int)
    df['series_type'] = 'TaxRecap'
    return df


def transform_PropertyTax(all_data):
   
    df = all_data['TaxRates'].copy()
    df = df[df['DOR Code'].apply(len)==3]

    for col in ['Residential','Open Space',
                'Commercial','Industrial',
                'Personal Property']:
        df[col]=100*df[col].astype(float)

    df = transform_rds(df)
    df['series_type'] = 'TaxRates'
    TaxRates = df.copy()
    TaxRates

    ##need to add DOR code to int_value_pair for special tax districts
    df = all_data['TaxRatesSpecial'].copy()

    df = df[df['DOR Code'].apply(len)==3].rename(columns={'Name':'Municipality'})
    for col in ['Residential','Open Space',
                'Commercial','Industrial',
                'Personal Property']:
        df[col]=100*df[col].astype(float)

    df = transform_rds(df)
    df['series_type'] = 'TaxRatesSpecial'
    TaxRatesSpecial = df.copy()
    TaxRatesSpecial

    ##series_type s/b TaxLevy special districts
    df = all_data['TaxLevy'].copy()
    df = df.rename(columns={'Name':'Municipality'})
    for col in df.columns:
        if '%' in col:
            df[col]=100*df[col].astype(float)

    df = transform_rds(df)
    df['series_type'] = 'TaxLevy'
    TaxLevy = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    TaxLevy


    df = all_data['NewGrowth'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    for col in df.columns:
        if '%' in col:
            df[col]=100*(df[col].astype(float))
    df = transform_rds(df)
    df['series_type'] = 'NewGrowth'
    NewGrowth = df.copy()


    df = all_data['OverlayReserve'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    for col in df.columns:
        if '%' in col:
            df[col]=100*(df[col].str.replace('%','').astype(float))
    df = transform_rds(df)
    df['series_type'] = 'OverlayReserve'
    OverlayReserve = df.copy()
    OverlayReserve

    df = all_data['PropertyTax'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    for col in df.columns:
        if '%' in col:
            df[col]=100*(df[col].str.replace('%','').astype(float))
    df = transform_rds(df)
    df['series_type'] = 'PropertyTax'
    PropertyTax = df.copy()
    PropertyTax

    df = all_data['AverageSingleFamilyTaxBill'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    for col in df.columns:
        if '%' in col:
            df[col]=100*(df[col].str.replace('%','').astype(float))
    df = transform_rds(df)
    df['series_type'] = 'AverageSingleFamilyTaxBill'
    AverageSingleFamilyTaxBill = df.copy()
    AverageSingleFamilyTaxBill

    df = all_data['AssessedValues'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    for col in df.columns:
        if '%' in col:
            df[col]=100*(df[col].str.replace('%','').astype(float))
    df = transform_rds(df)
    df['series_type'] = 'AssessedValues'
    AssessedValues = df.copy()
    AssessedValues

    df = all_data['AssessedValuesSpecial'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]

    df = df.rename(columns={'Name':'Municipality'})
    df = transform_rds(df)
    df['series_type'] = 'AssessedValuesSpecial'
    AssessedValuesSpecial = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    AssessedValuesSpecial



    # df = all_data['ParcelCounts']['Expenditures'].copy()
    # df = df[~pd.isnull(df['DOR Code'])]
    # df = df[df['DOR Code'].apply(len)==3]

    # df = transform_rds(df)
    # df['category'] = 'ParcelCounts'
    # ParcelCounts = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    # ParcelCounts

    # df = all_data['ParcelValues']['Expenditures'].copy()
    # df = df[~pd.isnull(df['DOR Code'])]
    # df = df[df['DOR Code'].apply(len)==3]

    # df = transform_rds(df)
    # df['category'] = 'ParcelValues'
    # ParcelValues = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    # ParcelValues


    df = all_data['RevenueSources'].copy()

    # df['series_metric']= 'Revenue Sources'
    cols = pd.Series(df.columns.tolist()).apply(pd.Series).sum(axis=1)
    df.columns =\
        cols.str.replace('Totals Without Enterprise and CPA Funds','without ')\
            .str.replace('Totals With Enterprise and CPA Funds','with ')\
            .str.replace('Unnamed: 0_level_0','')\
            .str.replace('Unnamed: 1_level_0','')\
            .str.replace('Unnamed: 2_level_0','')\
            .str.replace('Unnamed: 3_level_0','')\
            .str.replace('Unnamed: 4_level_0','')\
            .str.replace('Unnamed: 5_level_0','')\
            .str.replace('Unnamed: 6_level_0','')\
            .str.replace('Unnamed: 7_level_0','')
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]

    df=df.drop(['Municipality'],axis=1).rename(columns={
        'DOR Code':'dor',
        'Fiscal Year':'year'
    })
    df = df[df.dor!='Totals:']

    df = df . melt(['dor','year'],var_name='series_metric')

    mask = df.series_metric.str.contains('%')
    df.loc[mask,'value']=round(100*(df.loc[mask,'value']).astype(float).fillna(0),0).astype(int)

    for col in ['dor','year']:
        df[col]=df[col].fillna(0).astype(int)

    df['series_type']= 'RevenueSources'

    RevenueSources = df.copy()


    df = all_data['CIPTaxShift'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3]
    df
    ##6 digits of precison for % wtf
    for col in df.columns:
        if '%' in col:
            df[col]=round((10000*(df[col].str.replace('%','').astype(float))),0)
    for col in ['Lowest Residential Factor Allowed',
                'Max CIP Shift Allowed',
                'Residential Factor Selected','CIP Shift']:
        df[col]=round(1e6*(df[col].astype(float)),0)
    df = transform_rds(df)
    df['series_type'] = 'CIPTaxShift'
    CIPTaxShift = df.copy()
    CIPTaxShift
    
    return pd.concat( [
            AverageSingleFamilyTaxBill,
            AssessedValues,
            TaxRates,
            TaxRatesSpecial,
            TaxLevy,
            NewGrowth,
            OverlayReserve,
            PropertyTax,
            AssessedValuesSpecial,
            # ParcelCounts,
            # ParcelValues,
            RevenueSources,
            CIPTaxShift
    ])


def normalize_data(DOR_DataBank):

    query = """
    
    select ivp1.key,ivp1.value as series_metric,ivp2.value as series_type
    from common.int_value_pairs ivp1
    left join common.int_value_pairs ivp2 
        on ivp2.item='dor_databank_series_type' and ivp2.key=ivp1.key
    where ivp1.item='dor_databank_series'
    ORDER BY ivp1.key
    ;
    """
    
    xref = pd.read_sql_query(query,cnx)
    
    ##big deal, removing zeros...
    mask = DOR_DataBank.value==0
    DOR_DataBank = DOR_DataBank[~mask]
   
    DOR_DataBank=DOR_DataBank\
        .merge(xref,
               on=['series_type','series_metric'],
               how='left')\
        .rename(columns={"key":"dor_databank_series"})\
        .sort_values(['dor','dor_databank_series','year'])\
        .reset_index(drop=True)\
        [['dor','year','dor_databank_series','value']]
    
    DOR_DataBank = DOR_DataBank[~pd.isnull(DOR_DataBank.dor_databank_series)]

    DOR_DataBank=DOR_DataBank[~DOR_DataBank.duplicated(["dor","year","dor_databank_series"])]
    for col in ['year','dor','dor_databank_series','value']:
        DOR_DataBank[col]=DOR_DataBank[col].astype(int)
    
    mask = DOR_DataBank.dor_databank_series.isin(xref.key)
    assert((mask==False).any()==False)
    
    from scipy.stats import zscore
    
    DOR_DataBank['zscore']=0
    DOR_DataBank['zscore']=DOR_DataBank.groupby(['dor_databank_series','year'])['value'].transform(lambda x : zscore(x.astype(float),ddof=0))
    
    query = """
    
    select * from governance.dor_databank where year>={year};
    
    """.format(year=DOR_DataBank.year.min())
    
    current_dor_databank =  pd.read_sql_query(query,cnx)
    
    mask = current_dor_databank.dor_databank_series.isin(DOR_DataBank.dor_databank_series.unique())
    current_dor_databank=current_dor_databank[mask]
    
    combo = DOR_DataBank.merge(current_dor_databank,how='outer',indicator='matched',on=['dor','year','dor_databank_series'])
    
    missing = combo[combo.matched=='right_only']\
        .merge(xref,right_on='key',left_on='dor_databank_series',how='left')\
        . drop(['value_x','zscore_x','matched'],axis=1)\
        . rename(columns={'value_y':'value','zscore_y':'zscore'})
    for col in ['dor','year','dor_databank_series','value']:
        missing[col]=missing[col].astype(int)
    
    verify = combo[combo.matched=='both']
    mask = verify.value_x!=verify.value_y
    updates=verify[mask]. drop(['value_y','zscore_y','matched'],axis=1)\
        . rename(columns={'value_x':'value','zscore_x':'zscore'})
    updates.value=foo.value.astype(int)
    
    additions = combo[combo.matched=='left_only']\
        . drop(['value_y','zscore_y','matched'],axis=1)\
        . rename(columns={'value_x':'value','zscore_x':'zscore'})
    for col in ['dor','year','dor_databank_series','value']:
        additions[col]=additions[col].astype(int)
    
    return additions, updates, missing


def transform_votes(all_data):

    votes = pd.DataFrame()
    for key in ['Overrides','CapitalExclusion','DebtExclusion',
                'SpecialPurposeStabFund']:
        tmp = all_data[key].copy()
        tmp = tmp[~pd.isnull(tmp['DOR Code'])]
        tmp = tmp[tmp['DOR Code'].apply(len)==3]
    
        # if key!='Overrides':
        tmp['Vote Type']=key
        # else:
        #     tmp = tmp[tmp['DOR Code'].apply(len)==3]
            
        tmp=tmp.drop('Municipality',axis=1).rename(
            columns={
                'FY Excluded':'year',
                'Vote Type':'vote_type',
                'Yes Vote':'Yes Votes', 'No Vote':'No Votes',
                'Number Yes':'Yes Votes', 
                'Number No':'No Votes',
                'Vote Description':'vote_description',
                'Description':'vote_description',
                'Department':'department',
                'DOR Code':'dor',
                'Fiscal Year':'year',
                'Vote Date':'date',
                'Net Excludable Debt':'amount',
                'Amount':'amount'
            }
        )
    
        mask =\
            (tmp['year'] == '8 Selected') |\
            (tmp['year'] == 'Select Departments:') |\
            (tmp['year'] == 'Fiscal Year') |\
            (pd.isnull(tmp['year']))
        
        tmp = tmp[~mask]
        
        for col in ['dor','year']:
            tmp[col]=tmp[col].astype(int)
            
        tmp['date']=pd.to_datetime(tmp['date'])
    
        votes=pd.concat([votes,tmp])
        
    votes = votes.sort_values(['dor','year','date']).reset_index(drop=True)
    return votes


def transform_socioeconomic(all_data, cnx):

    df = all_data['DORIncome']['Expenditures'].copy()
    df = df[df['DOR Code'].apply(len)==3].drop('LEA Code',axis=1).rename(columns={'Cherry Sheet FY':'year'})
    df = transform_rds(df)
    df['series_type'] = 'DORIncome'
    DORIncome = df.copy()
    

    df = all_data['Population'].copy()
    df = df[df['DOR Code'].apply(len)==3]
    Population = df\
        .drop(['County','Municipality'],axis=1)\
        .rename(columns={'DOR Code':'dor'})\
        .melt('dor',var_name='year')\
        .sort_values(['dor','year'])\
        .reset_index(drop=True)

    Population['series_metric']='Population'
    Population['series_type']='Population'
    




    ##LaborFoce missing DOR Code, xref to int_value_pairs by municipality
    df = all_data['LaborForce'].copy()

    query = """SELECT * FROM common.int_value_pairs WHERE item='dor';"""
    dor = pd.read_sql_query(query,cnx)

    df = df\
            . drop("DOR Code",axis=1)\
            . merge(dor[["key","value"]],
                   how='left',
                   right_on='value',
                   left_on='Municipality')\
            . rename(columns={"key":"DOR Code","Year":"year"})\
            . drop("value",axis=1)
    assert(pd.isnull(df['DOR Code']).any()==False)

    df['Unemployment Rate']=round(100*df['Unemployment Rate'].astype(float))
    df = transform_rds(df)
    df['series_type'] = 'LaborForce'
    LaborForce = df.copy()
    


    df = all_data['MotorVehicles'].copy()
    df = df[df['DOR Code'].apply(len)==3].rename(columns={'Year':'year'})
    df['Average Age']=round(100*df['Average Age'].astype(float)).astype(int)
    df = transform_rds(df)
    df['series_type'] = 'MotorVehicles'
    MotorVehicles = df.copy()
    


    df = all_data['RegisteredVoters'].copy()
    df = df[~pd.isnull(df['DOR Code'])]
    df = df[df['DOR Code'].apply(len)==3].rename(columns={'Year':'year'})
    df = transform_rds(df)
    df['series_type'] = 'RegisteredVoters'
    RegisteredVoters = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    


    # df = all_data['RoadMiles'].copy()
    # df = df[~pd.isnull(df['DOR Code'])]
    # df = df[df['DOR Code'].apply(len)==3].rename(columns={'Year':'year'})
    # for col in df.columns[3:]:
    #     df[col]=round(100*df[col].astype(float)).astype(int)

    # df = transform_rds(df)
    # df['category'] = 'RoadMiles'
    # RoadMiles = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    

    df = all_data['ResidentBirths'].copy()\
        .drop(['Municipality'],axis=1)\
        .rename(columns={'DOR Code':'dor'})\
        .melt('dor',var_name='year')\
        .sort_values(['dor','year'])\
        .reset_index(drop=True)

    df = df[df['dor'].apply(len)==3]
    for col in ['dor','year','value']:
        df[col]=df[col].astype(int)

    df['series_metric']='ResidentBirths'
    df['series_type']='ResidentBirths'
    ResidentBirths=df.copy()

    df = all_data['CPI'].copy()
    df = df[df.columns[[0,5,6,13,14]]]
    df.columns = ['year','price_deflator','deflator_chg','cpi','cpi_chg']
    df["DOR Code"]=999
    df["Municipality"]="MA"
    df['cpi_chg']=round(100*df['cpi_chg'].astype(float))
    df['deflator_chg']=round(100*df['deflator_chg'].astype(float))
    df = transform_rds(df)
    df['series_type'] = 'CPI'
    CPI = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)


    df = pd.concat([
        DORIncome, 
        Population, 
        LaborForce, 
        MotorVehicles, 
        RegisteredVoters, 
        # RoadMiles,
        ResidentBirths,
        CPI
    ])

    for col in ['dor','year']:
        df[col]=df[col].astype(int)
        
    return df

def other_revenue_expenses(all_data):

    StabFunds=transform_rds(all_data['StabFunds'].copy())
    StabFunds['series_type']='StabFunds'
    assert((StabFunds.value.apply(type)==int).all())

    
    # df = all_data['HealthInsurance'].copy()
    # df=df.drop('Municipality',axis=1).rename(columns={
    #     'DOR Code':'dor'
    # })
    # df = df[df.dor!='Totals:']

    # df = df . melt(['dor'],var_name='year')

    # for col in ['dor','year','value']:
    #     df[col]=df[col].fillna(0).astype(int)

    # df['category']= 'HealthInsurance'
    # df['series_metric']= 'Health Insurance'
    # HealthInsurance = df.copy()

    ##Snowice

    df = all_data['SnowIce'].copy()
    df = df[df['DOR Code'].apply(len)==3]
    df = transform_rds(df)
    df['value']=df['value'].astype(int)
    df['series_type']='SnowIce'
    SnowIce = df.copy()



    return pd.concat(
        [
            StabFunds,
            SnowIce
        ]
    )
    
def transform_LocalAidTaxes(all_data):
    
    df = all_data['CherrySheetsAssessments']['Assessments'].copy()

    df = df[~pd.isnull(df['DOR Code'])]
    df = transform_rds(df)
    df['series_type'] = 'CherrySheetsAssessments'
    CherrySheetsAssessments = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    
    df = all_data['CherrySheetsAssessments']['Receipts'].copy()

    df = df[~pd.isnull(df['DOR Code'])]
    df = transform_rds(df)
    df['series_type'] = 'CherrySheetsReceipts'
    CherrySheetsReceipts = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    

    df = all_data['LocalTaxes'].copy()
    df = df[df.local_taxes_type=='meals'].drop('local_taxes_type',axis=1)
    df = transform_rds(df)
    df = df[df.value!=0]
    df['series_type'] = 'LocalMealsTax'
    LocalMealsTax = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    

    df = all_data['LocalTaxes'].copy()
    df = df[df.local_taxes_type=='rooms'].drop('local_taxes_type',axis=1)
    df = transform_rds(df)
    df = df[df.value!=0]
    df['series_type'] = 'LocalRoomsTax'
    LocalRoomsTax = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    

    df = all_data['LocalTaxes'].copy()
    df = df[df.local_taxes_type=='weed'].drop('local_taxes_type',axis=1)
    df = transform_rds(df)
    df = df[df.value!=0]
    df['series_type'] = 'LocalWeedTax'
    LocalWeedTax = df.copy().sort_values(['dor','series_metric','year']).reset_index(drop=True)
    
    return pd.concat([
        CherrySheetsAssessments,
        CherrySheetsReceipts,
        LocalMealsTax,
        LocalRoomsTax,
        LocalWeedTax
    ])


### Load Prop2.5 Votes

In [None]:
def load_votes(votes, cnx):
    
    table_create_votes = \
        """
            DROP TABLE IF EXISTS governance.prop25_votes;
            CREATE TABLE governance.prop25_votes (
                "dor" SMALLINT,
                "year" SMALLINT,
                "date" DATE,
                "Win / Loss" varchar(4),
                "Yes Votes" INT,
                "No Votes" INT,
                "vote_type" VARCHAR(22),
                "amount" BIGINT,
                "vote_description" VARCHAR(255),
                "department" VARCHAR(25)
            );
        """
    cnx.execute(table_create_votes)
    
    votes.to_sql('prop25_votes',schema='governance',con=cnx,
                 if_exists='append',index=False) 

### Execute ETL

In [None]:
all_data = {}

#### RevenuesExpenditures

In [None]:
print(all_data.keys())
all_data = extract_funds(all_data)
transformed_data = revenues_expenditures(all_data)

TaxRecap = transform_TaxRecap(all_data['TaxRecap'].copy())
transformed_data=pd.concat([transformed_data,TaxRecap])

additions, updates, missing = normalize_data(transformed_data.copy())

# additions.to_sql('dor_databank',schema='governance',con=cnx,
#              if_exists='append',index=False)

#### Property Tax

In [None]:
all_data = extract_series(all_data)

PropertyTax = transform_PropertyTax(all_data.copy())

# mask = (PropertyTax.series_type == 'RevenueSources')&(PropertyTax.dor==10)&(PropertyTax.year==2021)
# PropertyTax[mask]

additions, updates, missing = normalize_data(PropertyTax.copy())
additions
# additions.to_sql('dor_databank',schema='governance',con=cnx,
#              if_exists='append',index=False)

#### Prop2.5 Votes

In [None]:
all_data = extract_votes(all_data)
votes = transform_votes(all_data)
# load_votes(votes, cnx)

#### Other

In [None]:
Other_RE = other_revenue_expenses(all_data)

additions, updates, missing = normalize_data(Other_RE.copy())
additions
# additions.to_sql('dor_databank',schema='governance',con=cnx,
#              if_exists='append',index=False)

#### SocioEconomic

In [None]:
all_data = extract_socioeconomic(all_data)
SocioEconomic = transform_socioeconomic(all_data.copy(),cnx)
additions, updates, missing = normalize_data(SocioEconomic)
additions
# additions.to_sql('dor_databank',schema='governance',con=cnx,
#              if_exists='append',index=False)

#### Local Aid and Taxes

In [None]:
all_data = extract_local_taxes(all_data)
LocalAidTaxes = transform_LocalAidTaxes(all_data.copy())
additions, updates, missing = normalize_data(LocalAidTaxes)
# additions.to_sql('dor_databank',schema='governance',con=cnx,
#              if_exists='append',index=False)