Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Alaska bottom trawl data download and data compile #4

Closed
3 tasks done
Melissa-Karp opened this issue Apr 18, 2024 · 8 comments
Closed
3 tasks done

Update Alaska bottom trawl data download and data compile #4

Melissa-Karp opened this issue Apr 18, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@Melissa-Karp
Copy link
Collaborator

Melissa-Karp commented Apr 18, 2024

When the API is ready and @Melissa-Karp is ready with the Rscript I will need @EmilyMarkowitz-NOAA to modify AK code for downloading (download_ak.R file) and merging the various data files within the Compile_Dismap_Current.R file. Please fork this repository and submit a pull request when complete.

Task List

  • Show me how to do pull requests and issues on github
  • Edit the download_ak.R script to include a working API to download the files from FOSS
  • Edit the Compile script to properly merge the AK data files for each region, and match the final format and column names
@Melissa-Karp Melissa-Karp added the enhancement New feature or request label Apr 18, 2024
@Melissa-Karp Melissa-Karp self-assigned this Apr 18, 2024
@EmilyMarkowitz-NOAA
Copy link
Contributor

EmilyMarkowitz-NOAA commented Apr 18, 2024

Thanks for this!

✔️ I've shown you how to do pull requests (#3 and #2 ) and some issue management (#4).

⌚ Let me know when the scripts are ready and I'll get on it :) In the meantime, I have written a zero-filling script for our FOSS data that we may be able to apply to your code, which is provided as an example in the AFSC GAP team's production data documentation. It will look something like this:

# Load data
library(dplyr)
library(here)
library(readr)
catch <- readr::read_csv(file = here::here("data/gap_products_foss_catch.csv"))[,-1] # remove "row number" column
haul <- readr::read_csv(file = here::here("data/gap_products_foss_haul.csv"))[,-1] # remove "row number" column
species <- readr::read_csv(file = here::here("data/gap_products_foss_species.csv"))[,-1] # remove "row number" column

# come up with full combination of what species should be listed for what hauls/surveys
# for zero-filled data, all species caught in a survey need to have zero or non-zero row entries for a haul
comb <- dplyr::full_join(
  x = dplyr::left_join(catch, haul, by = "HAULJOIN") %>%
    dplyr::select(SURVEY_DEFINITION_ID, SPECIES_CODE) %>%
    dplyr::distinct(),
  y = haul %>%
    dplyr::select(SURVEY_DEFINITION_ID, HAULJOIN) %>%
    dplyr::distinct(), 
  by = "SURVEY_DEFINITION_ID", 
  relationship = "many-to-many"
)

# Join data to make a full zero-filled CPUE dataset
dat <- comb %>% 
  # add species data to unique species by survey table
  dplyr::left_join(species, "SPECIES_CODE") %>% 
  # add catch data
  dplyr::full_join(catch, c("SPECIES_CODE", "HAULJOIN")) %>% 
  # add haul data
  dplyr::full_join(haul) %>% # , c("SURVEY_DEFINITION_ID", "HAULJOIN")
  # modify zero-filled rows
  dplyr::mutate(
    CPUE_KGKM2 = ifelse(is.null(CPUE_KGKM2), 0, CPUE_KGKM2),
    CPUE_KGHA = CPUE_KGKM2/100, # Hectares
    CPUE_NOKM2 = ifelse(is.null(CPUE_NOKM2), 0, CPUE_NOKM2),
    CPUE_NOHA = CPUE_NOKM2/100, # Hectares
    COUNT = ifelse(is.null(COUNT), 0, COUNT),
    WEIGHT_KG = ifelse(is.null(WEIGHT_KG), 0, WEIGHT_KG) ) 

Where each table contains the following columns and dimensions. See a preview of these tables here:

Catch
image

Haul
image
image

Species
image

@Melissa-Karp
Copy link
Collaborator Author

Hi Em. I have updated the code on github so its ready for you to start making the necessary edits for the AK data. Ping me if you have any questions about the code. Thanks a bunch!

@EmilyMarkowitz-NOAA
Copy link
Contributor

Excellent! I am still waiting for the FOSS API to get up and running, and will jump on this as soon as it is available. If we get to crunch time, I have a backup plan where I will use the files downloaded in this google folder. Just so I can plan, when is the latest this needs to be completed?

@Melissa-Karp
Copy link
Collaborator Author

Melissa-Karp commented Apr 23, 2024 via email

@Melissa-Karp
Copy link
Collaborator Author

Melissa-Karp commented Apr 23, 2024 via email

@EmilyMarkowitz-NOAA
Copy link
Contributor

Sounds like a plan! I'll work on a download-from-oracle approach and then will add an API approach if the API is ready in time :) I'll let you know if I have any questions as I dig into this 👷‍♀️

@Melissa-Karp
Copy link
Collaborator Author

Melissa-Karp commented Apr 23, 2024 via email

EmilyMarkowitz-NOAA added a commit to afsc-gap-products/DisMAP that referenced this issue Apr 24, 2024
…d data wrangling. Use #TOLEDO 's to find EHM questions or comments. This is in support of nmfs-fish-tools#4
@Melissa-Karp
Copy link
Collaborator Author

Hi Em. I incorporated your edits into the download_ak and compile script offline, but pushed those changes to github so they are visible in this repo now. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants