# Group 9 Project Data Pre-Processing Description - Ocean Acidification

### We plan to research how increased ocean acidification affect the broader coastal ecosystem. This can be broken down into two specific research questions:
#### 1) Is the increase in ocean acidity associated with a net decrease in coastal biodiversity?
#### 2) What are the socio-economic and health-related impacts of ocean acidification.
#### By socio-economic impacts, we are discussing the societal and human impact of ocean and marine health, such as impacts on the fishing industry as well as coastal resilience efforts.

In [None]:
import pandas as pd
codap_df = pd.read_csv(r'CODAP_NA_v2021.csv', low_memory=False) # placed into group 9 project directory
zooplankton_df = pd.read_excel(r'BATS_zooplankton.xlsx') # placed into group 9 project directory
print(codap_df)
print(zooplankton_df)

There is no major data missing from either dataset. Each dataset contains exclusively raw measurement data of several oceanographic variables, the most important being:
**CODAP_NA_v2021**: 
This dataset includes discrete measurements from 2003 to 2018. An image showing measurement coverage is included in the project directory. Geographically, this area covers: U.S. West Coast, U.S. East Coast, Gulf of Mexico, Gulf of Alaska, Bering Sea, North Atlantic Ocean, and the North Pacific Ocean. The origin of the data is the NOAA Ocean Acidification Program (OAP). 
* General locational data
    - dates, lat, long, cruise identification numbers
* Dissolved inorganic carbon (DIC)
    - carbon deposits can either be organic or inorganic and dissolved or particulate
* Total alkalinity
* pH
* Continous and discrete seawater pCO2
    - dissolved CO2 concentrations
* Carbonate concentrations 
    - spectrophotometer
* Aragonite & calcite saturation
    - carbonite sources as an alternative measurement system for dissolved carbonite concentrations
* Revelle factor
    - ratio of CO2 change to change in total dissolved inorganic carbon (DIC)
* Oxygen concentration
* Apparent oxygen utilization 
    - measured dissolved oxygen concentration and its equilibrium saturation concentration in water with the same physical and chemical properties
* Various dissolved chemical concentrations important for biological life
    - silicate, phosphate, nitrate, nitrites, ammonium
* CTD measurements 
    - conductivity, temperature and depth measurements, includes salinity and pressure 

**BATS_zooplankton**
This dataset is sourced from the Bermuda Institute of Ocean Sciences and primarily measures zooplankton biomass. The range of data collection is from 1994 to 2020, with discrete measurement sampling. One of the major hypotheses that this study aims to answer is: *changes in seawater CO2-carbonate chemistry and ocean acidification indicators at BATS are the longest record in the global ocean and comparable to the six other globally distributed time-series (Bates et al., 2012; 2014)*, so this dataset is topical to our data analysis and visualization focus. 
* General locational data
* Max depth of sampling
* Volume of water sampled
    - volume of water sampled has significant influence on the concentration of ions and other carbonate-related minerals that might have an influence on zooplankton concentrations
* Weight of zooplankton biomass
    - wet weight and dry weight, ratios of weight-to-volume, total weights accounting for all size fractions of zooplankton