Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

section 3 & 4 content #5

Merged
merged 14 commits into from
Mar 9, 2021
Merged

section 3 & 4 content #5

merged 14 commits into from
Mar 9, 2021

Conversation

stevehadd
Copy link
Member

No description provided.

@stevehadd stevehadd requested a review from tam203 March 5, 2021 14:30
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

shjould -> should

tookls - > tools

stray * bullet after "how will my compute be scaled.."

zarr -> Zarr

Data Arrays -> Data arrays (lower A)

Compute Platform -> I would add local computer, admittedly it doesn't scale (other than over multi-core) but it's one of the good things about Pangeo, you can 'start small and local' and work your way up.

 private cloud (European Weather Cloud) -> Maybe add Jasmin if some of the students have access


Reply via ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trade-off is that higher level services often are less portable resulting in vendor lock-in. So we balance the convenience of higher-level services in our Pangeo implementation with the goals of reproducible, shareable research which favour open-source tools deployed on low-level services.

The final sentence sort of contradicts the previous. I think both are correct it could just do with something to join them.

This is the direction that cloud services are going and indeed many of the low-level cloud services are easily swappable. However many higher-level services that offer benefits such as lower maintenance or more streamlined integration can contribute to vendor lockin.


Reply via ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth mentioning that Spark and other libraries exist. Don't have to give detail.


Reply via ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it do this is a massively parallel way to speed up execution? There are three parts to the dask compute resources

This sentence doesn't read correctly to me. No 100% on what it's saying.

a client - usually the computer we are interacting on

(or the remote compute we are interacting with through the computer we are on/using) <- not phrased well!

typo:

cheduler -> scheduler

graphm -> graph

Have you mentioned compute/task graphs already?

The task graph will split up a large array by chunk, s

tasks don't just split up data into chunks but also algorithms or complex tasks (I would argue)


Reply via ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo:

rtather -> rather

agraph -> a graph

python -> Pyhton


Reply via ReviewNB

@@ -2,122 +2,525 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also vis graphs using dask vis which is fun - https://docs.dask.org/en/latest/graphviz.html


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully with the tools and libraries that are now available for scientific computing, such as dask, users of the platform in general, and specifically our first use case of Scientific Analyst or Researcher

This sentence doesn't quite end. hopefully what? Maybe just reword also alot of commas, maybe reword.


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

ecified All all the tex.. -> full stop only one all. I think.

Really interesting digression, never knew this!


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Described dataset - All data in a dataset is accessed through a single descriptor and contains all descriptions necessary for some one skilled in the domain the data describes to interpret the data.

..one skilled in the domain to fully interpret the data.

perhaps.


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 A researcher should easily be able to find the data related that exists related to the problem that they are working on. This relies on sufficiently detailed description of datasets being contained in the metadata and being accessible without reading the whole dataset.

You may wish to mention data search engines as an emerging theme/technology.

input to their pipeline in addressing a research question -> I would suggest removing this to shorten the sentence, its implied in useful.

and integrate teir use of the data with the rest of their research pipeline.


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data tat is analysis ready sh -> that is


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more one optimises for a specific, the less optimised it is likely to be for other cases. So one wants to make data ready for as broad a spectrum of possible uses as one can, but also focusing on optimising for the most common use cases while not excluding

the sentence doesn't quite finish.


Reply via ReviewNB

@@ -2,71 +2,206 @@
"cells": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 Informatics Lab largely Python focussed, R and scala are also important in the data science community,

Could change to Pangeo community. I'd also remove specific mention of Met Office, maybe just 'widely sued across environmental science'

I think this section needs a little refining for this context (I recognize it from Aarons's piece) but is good relevant stuff.


Reply via ReviewNB

@stevehadd stevehadd merged commit 596b715 into main Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants