## 1-3. Setup and Exploration

In [29]:
# Importing libraries
import pandas as pd
import numpy as np

# Loading in data
df = pd.read_csv('data/Colorado River Basin Water Conflict Table.csv')

In [3]:
# Adjusting settings to see all of the columns, and then viewing first 3 rows
pd.set_option("display.max.columns", None)
df.head(3)

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,Event Year,Conflict Present,Crisis Present,Basin,HUC6,HUC2,Place,County,County FIPS,State,State FIPS,Urban or Rural,Issue Type,Event Summary,Stakeholders,Intensity Value,Comments,Related Observation Themes,Article Text Search - water quality,Article Text Search - invasive species,Article Text Search - conservation,Article Text Search - drought,Article Text Search - flood,Article Text Search - ground water depletion,Article Text Search - depletion,Article Text Search - infrastructure,Article Text Search - fish passage,Article Text Search - instream water rights,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
0,1,USGS1-50.docx,The Durango Herald (Colorado),Tribes assert water rights on Colorado River B...,False,7-Apr-22,2022.0,,,4.0,2022.0,Y,N,Upper San Juan,140801,14,"Durango, CO",La Plata,8067.0,CO,8,Both,Water rights more generally,Ute Mountain and Southern Ute representatives ...,"Tribal Nations, State Government, Federal Gove...",2.0,The article highlights calls for negotiation b...,Lack of tribal representation,0,0,3,7,0,0,0,1,0,0,17,0,0,0,0,0,0,0,0,0
1,2,USGS1-50.docx,"Journal, The (Cortez, Dolores, Mancos, CO)",Native American tribes assert water rights on ...,False,7-Apr-22,2022.0,,,4.0,2022.0,Y,N,Upper San Juan,140801,14,"Durango, CO",La Plata,8067.0,CO,8,Both,Water rights more generally,Ute Mountain and Southern Ute representatives ...,"Southern Ute Indian Tribe, Ute Mountain Tribe,...",2.0,The article highlights calls for negotiation b...,Lack of tribal representation,0,0,2,7,0,0,0,1,0,0,17,0,0,0,0,0,0,0,0,0
2,3,USGS1-50.docx,The Salt Lake Tribune,'Very positive change.' New Utah law will be a...,False,17-Mar-22,2022.0,,,3.0,2022.0,N,Y,Great Salt Lake,160203,16,Great Salt Lake,,,UT,49,Both,Instream water rights,A bill is proposed in Utah that would expand t...,"State Government, Any Water Rights Holder, Agr...",3.0,The event is the proposal of the bill at the s...,Dishonoring the absent,0,0,1,2,0,0,0,0,0,0,12,0,0,0,1,0,0,0,12,1


In [28]:
# Data exploration
print(df['Basin'].unique())
print(df['Basin'].nunique())

['Upper San Juan' 'Great Salt Lake' 'Lower Colorado'
 'Entire Lower Colorado Basin' 'Entire Colorado River Basin' 'Lower Green'
 'South Platte' 'Lower Colorado-Lake Mead' 'Entire Upper Colorado Basin'
 'Middle Gila' 'Upper Colorado-Dirty Devil' 'Colorado Headwaters' nan
 'Little Colorado' 'White-Yampa' 'Lower Gila-Agua Fria' 'Upper Green'
 'Santa Cruz' 'Salt' 'Salton Sea'
 'Upper San Juan, Salton Sea, Salt, Rio De La Concepcion, Lower San Juan, Upper Colorado-Dirty Devil, Bill Williams, Middle Gila, Santa Cruz, Verde, Lower Gila, San Pedro-Willcox, Rio Sonoyta, Lower Colorado, Little Colorado, Lower Colorado-Lake Mead, Upper Gila, Rio De Bavispe, Lower Gila-Agua Fria '
 'Upper Colorado-Dirty Devil, Upper San Juan'
 'Upper Colorado-Dirty Devil, Lower San Juan, Little Colorado, Lower Colorado-Lake Mead'
 'Salt, Lower Colorado' 'Upper Gila' 'Lower San Juan']
25


In [26]:
# More data exploration
print(df.shape)
# print(df.dtypes)

(268, 48)


In [27]:
# More exploration
# df.info() // commented out bc I don't like how ugly it looks :(
df.isna().sum() # proper way to find number of NA values, because adds up 0s and 1s that represent Booleans

## 5. String accessor for `pandas.Series`

In [30]:
s = pd.Series(['California; Nevada', 'Arizona', np.nan, 'Nevada; Utah'])
s

0    California; Nevada
1               Arizona
2                   NaN
3          Nevada; Utah
dtype: object

In [31]:
# str accessor (doesn't do anything by itself)
s.str

<pandas.core.strings.accessor.StringMethods at 0x7fba46702f50>

In [32]:
# Use str accessor with additional methods to perform string operations
# .split splits strings by ';' and expands output into separate columns
s.str.split(';', expand=True)

Unnamed: 0,0,1
0,California,Nevada
1,Arizona,
2,,
3,Nevada,Utah


In [33]:
# Use stack() method to flatten the data frame into a series
# default is to drop NAs and None from result
s.str.split(';', expand=True).stack()

0  0    California
   1        Nevada
1  0       Arizona
3  0        Nevada
   1          Utah
dtype: object

## 6. Examine state codes

Which states in the dataset are reported as having water conflicts?

In [41]:
# filtering just states that have water conflicts
conflict = df[df['Conflict Present'] == 'Y']

# checking unique state values
conflict['State'].unique()

# oh no, there are observations with multiple states!

array(['CO', nan, 'AZ', 'OH; UT', 'UT', 'CA', 'AZ; NV', 'CO; UT; WY; NM',
       'AZ; CA', 'AZ; UT', 'NV; AZ', 'AZ; CA; CO; NV; NM; UT; WY', 'NV',
       'NM', 'UT; CO; WY', 'AZ; NM', 'WY; UT; CO', 'CO; AZ'], dtype=object)

The challenge to finding unique states in this way is that the 'State' column contains combinations of states (for example: 'WY;UT'). 

We have to split up these entries first, before we proceed.

## 8. Exploratory wrangling

In [49]:
# this creates some repetitions (for example, multiple CO!)
conflict['State'].str.split(';', expand = True).stack()
conflict['State'].str.split(';', expand = True).stack().unique()

array(['CO', 'AZ', 'OH', ' UT', 'UT', 'CA', ' NV', ' WY', ' NM', ' CA',
       'NV', ' AZ', ' CO', 'NM', 'WY'], dtype=object)

In [50]:
# fix by adding a ' ' after ;
conflict['State'].str.split('; ', expand = True).stack().unique()

# all good :)

array(['CO', 'AZ', 'OH', 'UT', 'CA', 'NV', 'WY', 'NM'], dtype=object)