2023-08-25
- Create a new directory
mkdir 20230825_upgg_bootcamp_data_organization
cd 20230825_upgg_bootcamp_data_organization/
- Download data
wget https://reproducible-science-curriculum.github.io/organization-RR-Jupyter/data/gapminderDataFiveYear_superDirty.xlsx
wget https://reproducible-science-curriculum.github.io/organization-RR-Jupyter/data/gapminderDataFiveYear_superDirty.txt
- Inspect table and ask questions
head gapminderDataFiveYear_superDirty.txt
tail gapminderDataFiveYear_superDirty.txt
- Two main strategies to make analysis easier to follow: organization and documentation
- Go through slides
- After conclusions ask folks to put github link to notes
-
Do exercise / ask quesitons
-
After tips and tricks section: create data directory and raw data subdirectory
mkdir data
mkdir data/raw
- Move raw data to raw data directory
mv gapminderDataFiveYear_superDirty.* data/raw/
cd data/raw/
- Make text files read only
chmod 444 gapminderDataFiveYear_superDirty.txt
chmod 444 gapminderDataFiveYear_superDirty.xlsx
- Ask to Try to remove or edit
nano gapminderDataFiveYear_superDirty.txt
rm gapminderDataFiveYear_superDirty.txt
- The second approach is record keeping and metatadata
- Go back to main directory
cd ../../
- Create readme
nano README.txt
- Go to raw data and create readme
nano data/raw/README.txt
- Project readme example
Make cleaned directory
cd data
mkdir cleaned
Create a copy of data
cp raw/gapminderDataFiveYear_superDirty.xlsx cleaned/.
cd cleaned/
chmod 755 gapminderDataFiveYear_superDirty.xlsx
Modify data (remove duplicated row)
open gapminderDataFiveYear_superDirty.xlsx
Add data modifications by hand to readme
nano README.txt
Go back to main directory
cd ..
Create rest of directory
mkdir output
mkdir code
Take a final look
ls
-
Examples from rotation
-
Example template https://github.com/jemilianosf/template_analysis
-
Make a github repository template: https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-template-repository