You're the new data engineer of a scientific team in charge of monitoring CO2 levels in atmosphere, which are at their highest in 800,000 years..
You have to give your best estimate of CO2 levels for 2050.
Your engineering team is famous for taking a great care of the developer experience: using Type, small functions (using .map, .filter, .reduce), tests and logs.
Your goal is to map, parse, filter CO2 concentration levels in the atmosphere coming from an observatory in Hawaii from 1950 to 2022.
For convenience, CO2 concentration levels have been inserted inside this file utils/ClimateService.
- Install a Scala compatible IDE : Visual Studio with a Scala Plugin or Idea: https://www.jetbrains.com/idea/
- Scala https://docs.scala-lang.org/getting-started/index.html
- Another link if the first one does not work : https://www.scala-sbt.org/download.html
- Help your colleagues to install their environment, the best way to learn.
sbt runShould give you an implementation is missing error :
(...)
2022-05-16 18:33:48 [run-main-0] INFO com.github.polomarcus.main.Main$ - Starting the app
[error] (run-main-0) scala.NotImplementedError: an implementation is missingSame for sbt test
Tips: having trouble to install Idea, SBT or scala? You can use Docker and Docker Compose to run this code and use your default IDE to code or a web IDE https://scastie.scala-lang.org/:
docker-compose build my-scala-app
docker-compose run my-scala-app bash # connect to your container to acces to SBT
> sbt test
# or
> sbt runPro Tips : https://www.scala-sbt.org/1.x/docs/Running.html#Continuous+build+and+test
Make a command run when one or more source files change by prefixing the command with ~. For example, in sbt shell try:
sbt
> ~ testQuick- Look at and update the function called "isClimateRelated" to add one more test
test/scala/ClimateServiceTest - Look at and update the function called "isClimateRelated" inside
main/scala/com/github/polomarcus/utils/ClimateService - To see if your code works, run
testOnly ClimateServiceTest -- -z isClimateRelated
With data coming from Hawaii about CO2 concentration in the atmosphere (they are stored inside the function "getCO2RawDataFromHawaii()", iterate over it and find the difference between the max and the min value.
- Look at and update "parseRawData" to add one more test
test/scala/ClimateService - Look at and update "parseRawData" function inside
main/scala/com/github/polomarcus/utils/ClimateService - Create your own function to find the min, max value. Write unit tests and run
sbt test
Tips:
- Use scala API to get max and min from a list : https://www.w3resource.com/scala-exercises/list/scala-list-exercise-6.php
- You can also use "reduce functions" such as
foldLeft: https://alvinalexander.com/scala/how-to-walk-scala-collections-reduceleft-foldright-cookbook/
- Create your own function to find the min, max value for a specific year. Write unit tests Tips:
- Re use
getMinMaxto create this function :
- Create your own function to difference between the max and the min. Write unit tests
Tips:
- https://www.tutorialspoint.com/scala/scala_options.htm
- https://blog.engineering.publicissapient.fr/2012/03/19/les-types-monadiques-de-scala-le-type-option/
- Remove all data from december (12), winter makes data unreliable there, values with
filterDecemberDatainsidemain/scala/com/github/polomarcus/utils/ClimateService
- implement
showCO2Datainsidemain/scala/com/github/polomarcus/utils/ClimateService - Make your Main program works using
sbt run
Estimate CO2 levels for 2050 based on past data.
Tips: Batch processing / Stream processing ?
If it works on your machine, congrats ! But remember, engineers have to work as team and to be sure it works on others' machines, you have to do something more.
Test it on a remote servers now thanks to a Continuous Integration (CI) system such as GitHub Actions :
- Have a look to the
.github/workflowsfolder and files - Something weird ? Have a look to their documentation : https://github.com/features/actions
- Ready to run a CI job ? Go on your Github's Fork/Clone of this and find the "Action" tab
- Find your CI job running
- Create a CI workflows using Docker to run the
sbt testcommand (inspiration : https://github.com/polomarcus/television-news-analyser/blob/main/.github/workflows/docker-compose.yml#L7-L17)
