New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

section 3 & 4 content #5

Merged

stevehadd merged 14 commits into main from section3_implementation

Mar 9, 2021

Member

stevehadd commented Mar 5, 2021

No description provided.

stevehadd added 3 commits

February 26, 2021 14:29


          Added more skeleton for section 3.

61c8530


          Added more skeleton for section 4.

64fac5b


          Added more content for section 3 and made some fixes to other sections.

67d1b52

stevehadd requested a review from tam203

March 5, 2021 14:30

review-notebook-app bot commented Mar 5, 2021

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

stevehadd added 3 commits

March 5, 2021 16:05


          added content about dask.

957a1c3


          Added some more content on notebooks and visualisations. Also added i…

6e55ed9

…ntake to conda requirements file


          Added content on data.

14e714e

tam203 reviewed

View reviewed changes

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

typo

shjould -> should

tookls - > tools

stray * bullet after "how will my compute be scaled.."

zarr -> Zarr

Data Arrays -> Data arrays (lower A)

Compute Platform -> I would add local computer, admittedly it doesn't scale (other than over multi-core) but it's one of the good things about Pangeo, you can 'start small and local' and work your way up.

private cloud (European Weather Cloud) -> Maybe add Jasmin if some of the students have access

Reply via ReviewNB

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

The trade-off is that higher level services often are less portable resulting in vendor lock-in. So we balance the convenience of higher-level services in our Pangeo implementation with the goals of reproducible, shareable research which favour open-source tools deployed on low-level services.

The final sentence sort of contradicts the previous. I think both are correct it could just do with something to join them.

This is the direction that cloud services are going and indeed many of the low-level cloud services are easily swappable. However many higher-level services that offer benefits such as lower maintenance or more streamlined integration can contribute to vendor lockin.

Reply via ReviewNB

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

might be worth mentioning that Spark and other libraries exist. Don't have to give detail.

Reply via ReviewNB

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

How does it do this is a massively parallel way to speed up execution? There are three parts to the dask compute resources

This sentence doesn't read correctly to me. No 100% on what it's saying.

a client - usually the computer we are interacting on

(or the remote compute we are interacting with through the computer we are on/using) <- not phrased well!

typo:

cheduler -> scheduler

graphm -> graph

Have you mentioned compute/task graphs already?

The task graph will split up a large array by chunk, s

tasks don't just split up data into chunks but also algorithms or complex tasks (I would argue)

Reply via ReviewNB

tam203 reviewed

View reviewed changes

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

typo:

rtather -> rather

agraph -> a graph

python -> Pyhton

Reply via ReviewNB

03_scalable_interactive_compute_pangeo_implementation.ipynb

		@@ -2,122 +2,525 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

You can also vis graphs using dask vis which is fun - https://docs.dask.org/en/latest/graphviz.html

Reply via ReviewNB

tam203 reviewed

View reviewed changes

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

Hopefully with the tools and libraries that are now available for scientific computing, such as dask, users of the platform in general, and specifically our first use case of Scientific Analyst or Researcher

This sentence doesn't quite end. hopefully what? Maybe just reword also alot of commas, maybe reword.

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

Typo

ecified All all the tex.. -> full stop only one all. I think.

Really interesting digression, never knew this!

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

Described dataset - All data in a dataset is accessed through a single descriptor and contains all descriptions necessary for some one skilled in the domain the data describes to interpret the data.

..one skilled in the domain to fully interpret the data.

perhaps.

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

A researcher should easily be able to find the data ~~related~~ that exists related to the problem that they are working on. This relies on sufficiently detailed description of datasets being contained in the metadata and being accessible without reading the whole dataset.

You may wish to mention data search engines as an emerging theme/technology.

input to their pipeline in addressing a research question -> I would suggest removing this to shorten the sentence, its implied in useful.

and integrate ~~teir use of the data~~ with the rest of their research pipeline.

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

Data tat is analysis ready sh -> that is

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

The more one optimises for a specific, the less optimised it is likely to be for other cases. So one wants to make data ready for as broad a spectrum of possible uses as one can, but also focusing on optimising for the most common use cases while not excluding

the sentence doesn't quite finish.

Reply via ReviewNB

04_data_a_modern_approach.ipynb

		@@ -2,71 +2,206 @@
		"cells": [

Collaborator

tam203 Mar 8, 2021

Informatics Lab largely Python focussed, R and scala are also important in the data science community,

Could change to Pangeo community. I'd also remove specific mention of Met Office, maybe just 'widely sued across environmental science'

I think this section needs a little refining for this context (I recognize it from Aarons's piece) but is good relevant stuff.

Reply via ReviewNB

stevehadd and others added 8 commits

March 8, 2021 17:41


          Added a conclusions and next steps notebook.

0577fb4


          Added examples of data to section 4 and info on catalogues. Also some…

d3e7d41

… corrections in section 3.


          Further updates for session 2

a8e3191


          Added content and diagrams to sections 3 and 4.

8bd5297


          Updating requirements file

88a1eec


          Merge branch 'section3_implementation' of github.com:informatics-lab/…

f7e5190

…PangeoLectures into section3_implementation


          Updates to notebooks after review and running the second session.

bcd0960


          Merge branch 'main' into section3_implementation

8e26fef

stevehadd merged commit 596b715 into main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment