# Good hunting weather: when to look for morel mushrooms
## Part 1, preparing the data

Prized for their rarity and rich umami flavor, morels inspire more passion than any other edible mushroom. A layer of mystery contributes to the mushroom's allure. Unlike shiitake, lions mane, oysters, or the grocery store button mushrooms, nobody yet has figured out a way to profitably cultivate morels. As such, they remain one of the few food items not available on demand. They must be gathered, not bought.

And so every spring, thousands of Americans stalk the forests in search of the mighty morel. The surest bet is to look in places that the mushrooms have been seen before. Elm, ash and poplar trees are commonly associated with morels, as are areas that have been recently burned. 

But when should they go to maximize their chances of success? 

### _The question_

What weather conditions best predict the occurence of morel mushrooms?

### _The dataset_

<img src="morel_sightings_by_month.png" width="900">

I started with research-grade observations of mushrooms in the Morchella genus uploaded between 2002 - 2022 to the iNaturalist database. iNaturalist is a citizen science project that allows anyone to upload an image of a living thing, which can then be identified by the online community. "Research Grade" observations are those which have at least two agreeing IDs, a location, and an exact date/time of observation.

Sometimes users upload observations with only a general location. There are technical reasons for this (imprecise GPS readings, lack of GPS connection at time of observation), but in this case there may be subterfuge afoot. Morel spots are jealously guarded, so community naturalists have an added incentive to be vague. Because I'm ultimately going to be associating morel observations with weather variables, I chose to include only those for which latitude-longtitude coordinates were accurate to with 1000 meters. 

GBIF.org (14 December 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.fudjcr

In [None]:
# Import libraries to preview the data
import pandas as pd

In [None]:
df = pd.read_csv("morchella_observations_cleaned.csv")
df.head()

There's some metadata about each observation that's not very relevant to my project. I'll tighten up the dataframe to include only what I'm interested in: location, time of observation, and (possibly) the species of morel. Let's see what can be dropped.

In [None]:
df.columns

In [None]:
df = df.drop(columns=['gbifID','datasetKey','occurrenceID','Unnamed: 12'])
df.head()

Better! I'm not sure if I'm going to make a species-level analysis. It would help to know how many observations there are for each species

In [None]:
df.species.value_counts()

Most of the observations are of Morchella americana. Of the other species observed, there are six with large enough counts to potentially draw some useful conclusions about occurence. I'll start with all the morel observations together, then perhaps do a species by species analysis.

I need to find the weather in the week preceding each observation. Based on what is already known about morel fruiting habits, I'm interested in these variables:

* Soil temperature
* Air temperature
* Amount of precipitation
* Relative humidity

To do this, I'll need to use an API with historical weather data. I'm grateful to find that Open Meteo offers just such a resource. [See here](https://openmeteo.substack.com/p/60-years-of-historical-weather-as) for more on the heruculean process of creating it.

API calls take this form: https://archive-api.open-meteo.com/v1/era5?latitude=52.5201&longitude=13.4121&start_date=2022-11-10&end_date=2022-12-09&temperature_unit=fahrenheit&hourly=temperature_2m,relativehumidity_2m&daily=temperature_2m_max,temperature_2m_min,precipitation_sum and return a JSON object.

Note that the API supports coordinate precision to four points and that dates are in the "YYYY-MM-DD" format. To minimize the amount of format-wrangling in my API call functions, I'll edit the the dataframe that meet these requirements.

In [None]:
api_df = df.round({'decimalLatitude': 4, 'decimalLongitude': 4})
api_df['date'] = api_df.agg('{0[year]}-{0[month]:02d}-{0[day]:02d}'.format, axis=1)
api_df.head()

In [None]:
# Import libraries to work with API data
import requests
import json
import datetime

api_url = 'https://archive-api.open-meteo.com/v1/era5'
weather_data_request = "hourly=temperature_2m,relativehumidity_2m,soil_temperature_0_to_7cm&daily=precipitation_sum&timezone=auto"

weather_data = []

for row in api_df.iterrows():
    row = row[1]
    latitude, longitude = row['decimalLatitude'], row['decimalLongitude']
    end_date = datetime.date(row['year'],row['month'],row['day'])
    start_date = end_date - datetime.timedelta(days=3)

    end_date = end_date.strftime("%Y-%m-%d")
    start_date = start_date.strftime("%Y-%m-%d")

    base_url = f"{api_url}?latitude={latitude}&longitude={longitude}&start_date={start_date}&end_date={end_date}&"
    request_url = base_url+weather_data_request

    response = requests.get(request_url)
    weather_data.append(response.json())

In [None]:
weather_df=pd.DataFrame.from_records(weather_data)

combined_df=pd.concat([df,weather_df], axis=1)
combined_df.head()

combined_df.to_csv('morchella_observations_and_weather.csv')

I am absolutely certain that there are smoother and more canonical ways to do this, both in pandas and in python, but now I have a dataset that links each observation with the weather from the four days that preceded it. If I were dealing with a larger dataset, I would break this out into a database, but since this consitutes only about 1800 observations, I'm going to continue work from a CSV file.

At this stage, there are already some things I'd like to have done better:

* Generate the API request links without iterating through the dataframe. When dealing with pandas data, there's usually a better approach than iterating
* Request from the API in a faster way. Execution of that block took 17 minutes! It might have gone faster had I used Python's asyncio library to iterate and make API requests concurrently.

In the next notebook, I'll explore the data to make connections between weather and morel appearances. I'm envisioning a few challenges:

* Each observation is now linked to a time series with both hourly and daily weather information. Basically, each row of the dataframe got a lot wider or longer. I've never dealt with data in this format. I think R will be better suited to the need to nest dataframes within eachother.
* The time series data is stored in a bunch of json objects. I'm not sure the best way to parse them. 