Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangeo training material for Big data geosciences #3147

Merged
merged 79 commits into from
Feb 18, 2022

Conversation

annefou
Copy link
Collaborator

@annefou annefou commented Jan 28, 2022

We have:

  • pangeo: Pangeo ecosystem 101 for everyone - Introduction to Xarray Galaxy Tools
  • pangeo-notebook: Pangeo Notebook in Galaxy - Introduction to Xarray

The first one (pangeo 101) is meant to be used by anyone and does not require any programming skills (using Galaxy Tools) and shows what is Pangeo and its community and how to use Xarray tools in Galaxy.

The second one (pangeo notebook) makes use of Pangeo JupyterLab interactive tool and is an introduction to Xarray for those who have basic Python programming skills.

@yvanlebras
Copy link
Collaborator

Amazing! Thank you Anne! Really top! I made a first rapid review and PR to nordicESMhub repo bit I think I made something wrong like 1 pr on main branch and another on the good pangeo one.... Don't hesitate if you have doubts on how to manage it ;)

@hexylena
Copy link
Member

The second one (pangeo notebook) makes use of Pangeo JupyterLab interactive tool and is an introduction to Xarray for those who have basic Python programming skills.

Fyi @annefou there is a new format you can opt-in to using, that generates the ipynb files automatically. You can see it in action here: https://training.galaxyproject.org/training-material/topics/data-science/ anything tagged jupyter-notebook and rmarkdown-notebook have these files automatically generated from their GTN content, if that's interesting to you

* Remove duplicated however

* Remove duplicates creating history mention
@annefou
Copy link
Collaborator Author

annefou commented Jan 28, 2022

Fyi @annefou there is a new format you can opt-in to using, that generates the ipynb files automatically.

Wow!!! This is so cool!!! I was looking for something like that!!! I definitely want it.
Thank you so much!

@hexylena
Copy link
Member

Oh, I even wrote documentation! https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-tutorial-content/tutorial.html#automatic-jupyter-notebooks

Do not use the built in citation system

is also outdated, citations work now.

Anne Fouilloux and others added 6 commits February 18, 2022 14:30
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
Co-authored-by: Yvan Le Bras <yvan.le-bras@mnhn.fr>
@annefou
Copy link
Collaborator Author

annefou commented Feb 18, 2022

thank you @yvanlebras and Solenne! I'll update the few pending comments (from previous review too). Many thanks for reviewing this material!

@annefou
Copy link
Collaborator Author

annefou commented Feb 18, 2022

Ok. So I think I took into account all your comments. Thanks a lot for your review!

@yvanlebras
Copy link
Collaborator

Amazing! I will try to test this final version now and validate it! If you think @annefou you can test mine ;) #3152 this can be amazing !!!! Have a nice week-end!

@annefou
Copy link
Collaborator Author

annefou commented Feb 18, 2022

Amazing! I will try to test this final version now and validate it! If you think @annefou you can test mine ;) #3152 this can be amazing !!!! Have a nice week-end!

Cool. Yes I can review your training material! Thanks.

@yvanlebras
Copy link
Collaborator

Really sorry... now I have the dataset, I have an error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/jwd/main/041/778/41778869/tmp/ipykernel_354/4122857281.py in <module>
----> 1 dset = xr.open_dataset("CAMS-PM2_5-20211222.netcdf")

/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    477 
    478     if engine is None:
--> 479         engine = plugins.guess_engine(filename_or_obj)
    480 
    481     backend = plugins.get_backend(engine)

/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    150         )
    151 
--> 152     raise ValueError(error_msg)
    153 
    154 

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib', 'pydap', 'rasterio', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
http://xarray.pydata.org/en/stable/getting-started-guide/installing.html
http://xarray.pydata.org/en/stable/user-guide/io.html

typing this dset = xr.open_dataset("CAMS-PM2_5-20211222.netcdf") ...

@annefou
Copy link
Collaborator Author

annefou commented Feb 18, 2022

my jupyter notebook FYI https://3525516ba8d111f5-3742e8a717c04bddb0a50d763b550537.interactivetoolentrypoint.interactivetool.ecology.usegalaxy.eu/ipython/lab/tree/Untitled.ipynb

I am not sure why. It usually happens when the type of the file is not set to netcdf but h5. Actually it did not find the file. The error is a bit misleading... Your file is in the data folder:

dset = xr.open_dataset("data/CAMS-PM2_5-20211222.netcdf")

@yvanlebras
Copy link
Collaborator

ok, I retest from start and it is ok now! I can go further! THANK YOU !

@yvanlebras
Copy link
Collaborator

hey hey! Done!

image

@yvanlebras
Copy link
Collaborator

Amazing tuto! Thank you Anne!!!!!

@gallardoalba gallardoalba merged commit 72087e5 into galaxyproject:main Feb 18, 2022
@bgruening
Copy link
Member

What a cool tutorial!

@annefou annefou deleted the pangeo branch February 19, 2022 08:28
@shiltemann
Copy link
Member

whoo!! so awesome! 🎉

(currently there seems to be a problem with rendering the slides video, but we are working on it!)

@annefou
Copy link
Collaborator Author

annefou commented Feb 21, 2022

(currently there seems to be a problem with rendering the slides video, but we are working on it!)

Let me know if there is anything to do on my side.

@shiltemann
Copy link
Member

@annefou nah, there was a small bug in the video generation. but the video's are up now :) The pronunciation of "Pangeo" is a bit off tho, so we wil look into teaching it how to pronounce it

@annefou
Copy link
Collaborator Author

annefou commented Feb 23, 2022

Cool! That's awesome!

I find a few "dots" that cut sentences and sometimes it is very odd. I guess sentences were far too long. I have started to note precisely when it happens for the first video. Let me know how I can fix these issues.

in the first pangeo video:

  • 1:55 there is a dot at the end and it should be removed (probably my fault)!
    must be scalable ... current and future challenges of big data ... e.g. no dot beteen future and challenges.
  • 2:08 we should also remove the dot after use cases e.g. use cases as well as ...
  • 2:20 remove the dot after be e.g. cannot be tackled separately.
  • 2:30 remove dot after define e.g. developers can define priorities for future...
  • 3:22 remove dot after interface e.g. user interface with many functions...
  • 3:54 remove dot after Galaxy e.g. from Galaxy Tools can be useful.

We have similar issues in the second pangeo video (for pangeo-notebook.
Also netCDF is not pronounced correctly. I think I should have written net CDF or net-CDF (I forgot about it).

Thanks!

How can I fix these small issues?

@hexylena
Copy link
Member

Also netCDF is not pronounced correctly. I think I should have written net CDF or net-CDF (I forgot about it).

You can add these in bin/ari-map.yml, Keep writing netCDF in your slides (better for screen readers/etc), and then the ari-map will map those terms to the way to pronounce them.

@hexylena
Copy link
Member

I guess sentences were far too long

Ahh I see what happened, you didn't use bullet points, so they were treated as individual lines. Until now most people have used bullet points or at least had a full sentence on a single line, rather than wrapping which is what's causing the error.

If you rearrange the subtitles so an entire line of text is a single line in the file, this will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants