-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
section 3 & 4 content #5
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…ntake to conda requirements file
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
shjould -> should
tookls - > tools
stray * bullet after "how will my compute be scaled.."
zarr -> Zarr
Data Arrays -> Data arrays (lower A)
Compute Platform -> I would add local computer, admittedly it doesn't scale (other than over multi-core) but it's one of the good things about Pangeo, you can 'start small and local' and work your way up.
private cloud (European Weather Cloud) -> Maybe add Jasmin if some of the students have access
Reply via ReviewNB
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trade-off is that higher level services often are less portable resulting in vendor lock-in. So we balance the convenience of higher-level services in our Pangeo implementation with the goals of reproducible, shareable research which favour open-source tools deployed on low-level services.
The final sentence sort of contradicts the previous. I think both are correct it could just do with something to join them.
This is the direction that cloud services are going and indeed many of the low-level cloud services are easily swappable. However many higher-level services that offer benefits such as lower maintenance or more streamlined integration can contribute to vendor lockin.
Reply via ReviewNB
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth mentioning that Spark and other libraries exist. Don't have to give detail.
Reply via ReviewNB
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it do this is a massively parallel way to speed up execution? There are three parts to the dask compute resources
This sentence doesn't read correctly to me. No 100% on what it's saying.
a client - usually the computer we are interacting on
(or the remote compute we are interacting with through the computer we are on/using) <- not phrased well!
typo:
cheduler -> scheduler
graphm -> graph
Have you mentioned compute/task graphs already?
The task graph will split up a large array by chunk, s
tasks don't just split up data into chunks but also algorithms or complex tasks (I would argue)
Reply via ReviewNB
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,122 +2,525 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also vis graphs using dask vis which is fun - https://docs.dask.org/en/latest/graphviz.html
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully with the tools and libraries that are now available for scientific computing, such as dask, users of the platform in general, and specifically our first use case of Scientific Analyst or Researcher
This sentence doesn't quite end. hopefully what? Maybe just reword also alot of commas, maybe reword.
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
ecified All all the tex.. -> full stop only one all. I think.
Really interesting digression, never knew this!
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Described dataset - All data in a dataset is accessed through a single descriptor and contains all descriptions necessary for some one skilled in the domain the data describes to interpret the data.
..one skilled in the domain to fully interpret the data.
perhaps.
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A researcher should easily be able to find the data related that exists related to the problem that they are working on. This relies on sufficiently detailed description of datasets being contained in the metadata and being accessible without reading the whole dataset.
You may wish to mention data search engines as an emerging theme/technology.
input to their pipeline in addressing a research question -> I would suggest removing this to shorten the sentence, its implied in useful.
and integrate teir use of the data with the rest of their research pipeline.
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more one optimises for a specific, the less optimised it is likely to be for other cases. So one wants to make data ready for as broad a spectrum of possible uses as one can, but also focusing on optimising for the most common use cases while not excluding
the sentence doesn't quite finish.
Reply via ReviewNB
@@ -2,71 +2,206 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Informatics Lab largely Python focussed, R and scala are also important in the data science community,
Could change to Pangeo community. I'd also remove specific mention of Met Office, maybe just 'widely sued across environmental science'
I think this section needs a little refining for this context (I recognize it from Aarons's piece) but is good relevant stuff.
Reply via ReviewNB
… corrections in section 3.
…PangeoLectures into section3_implementation
No description provided.