Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text for Introduction #24

Closed
StephanEwen opened this issue Apr 1, 2014 · 2 comments
Closed

Text for Introduction #24

StephanEwen opened this issue Apr 1, 2014 · 2 comments

Comments

@StephanEwen
Copy link
Contributor

I cannot make a pull request for this section, because I am on mobile internet and cannot afford to clone the stratosphere.github.io repository. I have pasted my text below. Sorry Ufuk, for causing additional work.

Introduction

Analysis programs in Stratosphere's are regular Java Programs that implement transformations on data sets (e.g., filtering, , mapping, joining, grouping). The data sets are initially created from certain sources (e.g., by reading files, or from collections). The results are returned by sinks, which may for example write the data to (distributed) files, or print it to the command line. The sections on the program skeleton and transformations show the general template of a program and describe the available transformations.

Stratosphere programs can run in a variety of contexts, for example locally as standalone programs, locally embedded in other programs, or on clusters of many machines (see [program skeleton] how to define different environments). All programs are executed lazily: When the program is run and the transformation method on the data set is invoked, it creates a specific transformation operation. That transformation operation is only executed once program execution is triggered on the environment. Whether the program is executed locally or on a cluster depends on the environment of the program.

In contrast to the Stratospheres Record API, the Java API is strongly typed: All data sets and transformations accept typed elements rather than generic records. This allows to catch typing errors very early and supports safe refactoring of programs.

@StephanEwen
Copy link
Contributor Author

I tried to keep it short (I guess people do not want to read a lot). If you find something non-intuitive or if you think something is missing, please comment.

@uce
Copy link
Contributor

uce commented Apr 1, 2014

Merged in d7c86a4.

I removed the reference to the record API. I like the conciseness, but we need to make sure that the text appeals to people new to the system.

@uce uce closed this as completed Apr 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants