Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform repo into using nbdev framework #6

Merged
merged 40 commits into from Jul 5, 2021
Merged

Transform repo into using nbdev framework #6

merged 40 commits into from Jul 5, 2021

Conversation

simeneide
Copy link
Collaborator

nbdev is a framework from fast.ai that develops everything in notebooks. It its heavily opinionated, but also gives us documentation and pip packaging "for free". Ive transformed the repo into this format.

Todo: My main issue now is that I removed the following lines from .gitattributes which means git lfs no longer works. But nbdev did not build with these present..

*.pt filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.json filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text

@NegatioN
Copy link
Collaborator

NegatioN commented Jul 1, 2021

.DS_Store should not be commited I think?

LICENSE Show resolved Hide resolved
README.md Outdated


True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to have: A little tqdm bar to see how the download progresses? Can be added to the ToDo's if you want it.

Copy link
Contributor

@SofieVerrewaere SofieVerrewaere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

README.md Outdated Show resolved Hide resolved
Before anything else, please install the git hooks that run automatic scripts during each commit and merge to strip the notebooks of superfluous metadata (and avoid merge conflicts). After cloning the repository, run the following command inside it:
```
nbdev_install_git_hooks
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to elaborate a bit more on the Dev Cycle, maybe also add a linter (black?)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add link to: https://github.com/fastai/nbdev/ to get people started ... and add run pip install nbdev or conda install -c fastai nbdev.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is the standard text from nbdev, but its interesting that they did not include these links :)

@simeneide
Copy link
Collaborator Author

@SofieVerrewaere :

nice to have: A little tqdm bar to see how the download progresses? Can be added to the ToDo's if you want it.

The underlying download library allowed to show the downloaded file size as a progress. Added that as an option.

@simeneide
Copy link
Collaborator Author

Google colab fails when loading and transforming the numpy data into a torch array. The main issue is the "slates" data. It is currently stored as int64, but we can try to reduce it to int32. Compressed saving only reduces from 1.2gb to 1.1gb, but the in-memory usage is halfed: 4555290000/9110580000 = 0.5

logging.info('Load data..')
data = {}
with np.load("{}/data.npz".format(data_dir)) as data_np:
    for key in data_np.keys():
      # Load and close for each item:
      with np.load("{}/data.npz".format(data_dir)) as data_tmp:
        print(key, data_tmp[key].nbytes)
        data[key] = data_tmp[key].astype("int32")
        #data[key] = torch.tensor(data_tmp[key], dtype=torch.int)

@simeneide
Copy link
Collaborator Author

I removed the attributes from git lfs to make nbdev work. So a bug now is that git lfs is not working properly. For usage, however, this is less relevant as users can download the files from gdrive using the buildin function.

I also created a int32 version of the data.npz file that is uploaded to gdrive, decreasing the memory footprint and makes it usable in google colab. I'll merge with these fixes.

@simeneide simeneide merged commit eb3a3a1 into main Jul 5, 2021
@simeneide simeneide deleted the nbdev branch August 11, 2021 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants