Skip to content

Conversation

@luutuankiet
Copy link
Contributor

added devcontainer config & setup scripts to spin a host devcontainer matching the master spark's environment in efficient_data_processing_spark/data-processing-spark/1-lab-setup/containers/spark.

The devcontainer will be able to perform imports and run arbitrary code for further exploration if needed :

  • use matching python version (3.9)
  • install & match python packages in requirements.txt
  • add the python code base (data-processing-spark) to host PYTHONPATH

@josephmachado
Copy link
Owner

Wow thank you for the great work @luutuankiet

I am wondering if it would be possible to add any tests for this?

@luutuankiet
Copy link
Contributor Author

@josephmachado good catch, I don't have much experience writing these kinds of test for container but I'll give it a shot!

For now I am leaning to a workflow that builds a container from the devcontainer.json instructions, then run the primary make commands from the course for example make up, make setup etc.
Test should be successful if the make commands run as expected, but I think most of the make commands are docker exec/ run that don't really return an exit code to assert... I'll do some digging & let you know if this can be achieved. Else, let's reject & close this PR if t don't make much sense test-wise.

For now you can test it yourself by opening the commit in a codespace. It actually helped me debug a network related issue when I tried to setup the lab.

@josephmachado
Copy link
Owner

@josephmachado good catch, I don't have much experience writing these kinds of test for container but I'll give it a shot!

For now I am leaning to a workflow that builds a container from the devcontainer.json instructions, then run the primary make commands from the course for example make up, make setup etc. Test should be successful if the make commands run as expected, but I think most of the make commands are docker exec/ run that don't really return an exit code to assert... I'll do some digging & let you know if this can be achieved. Else, let's reject & close this PR if t don't make much sense test-wise.

For now you can test it yourself by opening the commit in a codespace. It actually helped me debug a network related issue when I tried to setup the lab.

Ah I should clarify here @luutuankiet If you can record a video or add instructions on how to use this, me and others can benefit from it.

With the instructions I'll try to recreate it (locally and on codespaces) and if that works, we should be good to go.

@luutuankiet
Copy link
Contributor Author

luutuankiet commented May 29, 2024

@josephmachado gotcha, please find the instructions below:

  1. clone the repo to local, run VS code command rebuild and reopen in container or build and reopen in container to spin up the host container
image

Alternatively, head over to my branch and hit open in codespace which will spin up a host conatiner on github codespace.
image

  1. wait for the container to finish building. The first build will take some time as I've bundle a couple of vs code extensions in devcontainer.json and utils in postCreateCommand.sh. Feel free to comment out features you don't need as long as it
  • is in the customizations.extensions block for devcontainer.json. ("customizations.settings" is required for the paths to work)
  • is not related to PYTHON in the postCreateCommand.sh

(other files such as env_init.sh, source_env.sh are also required to be kept as is for the scripts to source and invoked correctly. Also, run the vs code command "Reload window" if after finished building the containers vs code shows any extensions errors.)

  1. once the container is up, run make up and other make commands to test the setup.
image
  1. the devcontainer should now show context definitions on hovering any code in the data-processing-spark folder :

devcontainer

A closing note, I find this setup beneficial for someone who likes to get their hands dirty by exploring the source code and understand how to rebuild them altogether. The Make commands are intuitive as it is but not without its own limitation as they're wrapped by docker exec/run commands which abstracts away the code flow.

@josephmachado
Copy link
Owner

This is great, TY

Copy link
Owner

@josephmachado josephmachado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@josephmachado josephmachado merged commit cd5316e into josephmachado:main May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants