Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC - Hardlinking of duplicated pyc files #19

Merged
merged 1 commit into from
Feb 10, 2020
Merged

Conversation

frenzymadness
Copy link
Member

Proof of concept of hardlinking of duplicated pyc files.

There is one weakness I am aware of - compatibility with Python 3.4. Otherwise, it looks good and it also seems to work.

Fixes #16

@frenzymadness
Copy link
Member Author

Feel free to comment if you think that something can be done better.

Copy link
Member

@hroncok hroncok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this looks OK. I would have done it in a similar way. I'm OK if 3.4 warns and ignores the option.

compileall2.py Outdated Show resolved Hide resolved
compileall2.py Outdated Show resolved Hide resolved
compileall2.py Show resolved Hide resolved
test_compileall2.py Outdated Show resolved Hide resolved
compileall2.py Outdated Show resolved Hide resolved
@frenzymadness
Copy link
Member Author

See the latest update. The Python 3.4 compatibility is kinda painful but we can keep it for now.

test_compileall2.py Show resolved Hide resolved
test_compileall2.py Outdated Show resolved Hide resolved
Copy link
Member Author

@frenzymadness frenzymadness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last commit I've improved the implementation for Python 3.4 when only (1, 2) optimization levels are used.

test_compileall2.py Outdated Show resolved Hide resolved
@hroncok
Copy link
Member

hroncok commented Jan 30, 2020

One test that I think we need to have:

  1. Compile a "deduplicable" module with compileall with the deduplication flag for all levels
  2. Change the module content
  3. Compile a "deduplicable" module with compileall with the deduplication flag for some levels only
  4. Assert only the proper files were changed, not all

Alternate version fo the test: Replace (3) with importing the module from Python.

@frenzymadness
Copy link
Member Author

One test that I think we need to have:

1. Compile a "deduplicable" module with compileall with the deduplication flag for all levels
$ echo "a = 0" > module.py
$ python -m compileall2 -o 0 -o 1 -o 2 --hardlink-dupes ./
Listing './'...
Compiling './module.py'...
$ ls -i __pycache__/
8792583 module.cpython-37.opt-1.pyc  8792583 module.cpython-37.pyc
8792583 module.cpython-37.opt-2.pyc
$ md5sum __pycache__/*
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-1.pyc
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-2.pyc
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.pyc
2. Change the module content
$ echo "b = 1" > module.py
3. Compile a "deduplicable" module with compileall with the deduplication flag for some levels only
$ python -m compileall2 -o 0 -o 2 --hardlink-dupes ./
Listing './'...
Compiling './module.py'...
4. Assert only the proper files were changed, not all
$ ls -i __pycache__/
8792583 module.cpython-37.opt-1.pyc  8792590 module.cpython-37.pyc
8792590 module.cpython-37.opt-2.pyc
$ md5sum __pycache__/*
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-1.pyc
dc87c896d4f87fba2c30da386fe701f3  __pycache__/module.cpython-37.opt-2.pyc
dc87c896d4f87fba2c30da386fe701f3  __pycache__/module.cpython-37.pyc

It seems that it works. I'll create a test.

I'd check that only compiled files in step 3 have different inode and checksum and the file compiled in the step 1 has the same inode/checksum as before. WDYT?

@hroncok
Copy link
Member

hroncok commented Jan 31, 2020

I'd check that only compiled files in step 3 have different inode and checksum and the file compiled in the step 1 has the same inode/checksum as before. WDYT?

Sounds good.

I'd also test this:

$ echo "a = 0" > module.py
$ python -m compileall2 -o 0 -o 1 -o 2 --hardlink-dupes ./
Listing './'...
Compiling './module.py'...
$ ls -i __pycache__/
8792583 module.cpython-37.opt-1.pyc  8792583 module.cpython-37.pyc
8792583 module.cpython-37.opt-2.pyc
$ md5sum __pycache__/*
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-1.pyc
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-2.pyc
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.pyc
$ echo "b = 1" > module.py
$ python -c 'import module'
$ ls -i __pycache__/
8792583 module.cpython-37.opt-1.pyc  8792590 module.cpython-37.pyc
8792583 module.cpython-37.opt-2.pyc
$ md5sum __pycache__/*
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-1.pyc
3d62806ca2e42ea0b8e3a96dd7a1cbb8  __pycache__/module.cpython-37.opt-2.pyc
dc87c896d4f87fba2c30da386fe701f3  __pycache__/module.cpython-37.pyc

@frenzymadness
Copy link
Member Author

Even though I think we are testing this too much, the tests are implemented in the two latest commits.

@frenzymadness frenzymadness marked this pull request as ready for review February 3, 2020 09:51
@hroncok
Copy link
Member

hroncok commented Feb 3, 2020

There's no such thing as testing too much 😄

This is a response to some of the things mentioned in the discussion.

compileall2.py Outdated Show resolved Hide resolved
compileall2.py Outdated Show resolved Hide resolved
compileall2.py Outdated Show resolved Hide resolved
@hroncok
Copy link
Member

hroncok commented Feb 5, 2020

So, technically, all the tests that assert equal inode numbers will fail on Windows? This is not a big deal in Fedora, but will need some tweaks when proposing upstream. I suppose some of the following:

  • disable the flag entirely on Windows, skip all the added tests on Windows
  • document that the flag is a no-op on Windows, skip inode equality asserts on Windows
  • upon linking the files, check inode numbers and when they differ, issue a warning and disable the flag globally, skip all the added tests on Windows, add a new one for Windows

The first option seems like the easiest one.

@hroncok
Copy link
Member

hroncok commented Feb 6, 2020

Anyway, for now, I think we can have this and test it in Fedora without worrying about Windows. WDYT?

@hroncok
Copy link
Member

hroncok commented Feb 6, 2020

Could you rebase this, so we see the tests results on Windows?

@frenzymadness
Copy link
Member Author

That's my plan. Rebase, clean, squash, skip tests on Windows…

@hroncok
Copy link
Member

hroncok commented Feb 6, 2020

The tests simply passed?

@frenzymadness
Copy link
Member Author

The tests simply passed?

It seems so :D I don't know how inodes work on Windows.

Copy link
Member

@hroncok hroncok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and release this, so we can backport it into Fedora from a released version? Or would you prefer to test it in copr first?

@frenzymadness
Copy link
Member Author

Let's merge and release this, so we can backport it into Fedora from a released version? Or would you prefer to test it in copr first?

It's more or less up to you. What do you prefer from minimization perspective? I'm ok with merge, release and use in Fedora directly.

@hroncok
Copy link
Member

hroncok commented Feb 10, 2020

Ok, let's get a merge and release please.

It is possible to save some disk space
on posix systems by using hard links for
identical pyc files produced for different
optimization levels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: Option to hardlink duplicate bytecache files of different opt. levels
2 participants