Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

things I don't like about the json <--> hg conversion code #12342

Closed
williamstein opened this issue Jan 22, 2012 · 12 comments
Closed

things I don't like about the json <--> hg conversion code #12342

williamstein opened this issue Jan 22, 2012 · 12 comments

Comments

@williamstein
Copy link
Contributor

  1. It doesn't print a usage message when the script is just run:
wstein@sage:/scratch/wstein/sage-4.8-sage.math.washington.edu-x86_64-Linux$ ./sage -sh

Starting subshell with Sage environment variables set.
Be sure to exit when you are done and do not do anything
with other copies of Sage!

Bypassing shell configuration files ...

SAGE_ROOT=/mnt/usb1/scratch/wstein/sage-4.8-sage.math.washington.edu-x86_64-Linux
(sage subshell) sage:sage-4.8-sage.math.washington.edu-x86_64-Linux wstein$ cd spkg/base
SAGE_ROOT=/mnt/usb1/scratch/wstein/sage-4.8-sage.math.washington.edu-x86_64-Linux
(sage subshell) sage:base wstein$ ./json_bundle.py
SAGE_ROOT=/mnt/usb1/scratch/wstein/sage-4.8-sage.math.washington.edu-x86_64-Linux

There should have been a usage message detailing the options.

  1. There really are no tests at all of this. The ticket mercurial --> plain text --> mercurial #3052 where it was introduced talked about the challenge of writing tests, but then seemed to conclude with totally giving up. This is not acceptable. I'm totally opposed to ever introducing code into Sage that does anything useful if it doesn't have tests. If it isn't, tested then it will definitely be broken soon enough. This is exactly the sort of code that is likely to break too, due to Mercurial making some format change, us switching away from Mercurial etc. With no tests, the brokeness of this code could go unnoticed even if we switched all of sage to git.

  2. The name bundle.json for the JSON hg repo could clash with an existing file, and is confusing, since it doesn't mention hg at all. It would be much better to use .hg_bundle.json, since by starting with ".hg" it's clear it is something that is reserved for Mercurial usage.

  3. This code needs to be refactored. A bare minimum functionality for this should have been something like:

$ sage -hg_text_expand .hg
# produce .hg_bundle.json
$ sage -hg_text_collapse .hg_bundle.json
# create .hg or give an error if .hg is there.

Then the other functionality could be built on that. As it is, just doing the conversion either way runs a bunch of code in the middle of a for loop (over spkg's). Currently the most basic functionality (which could be easily tested) is obfuscated; if it were available other people could easily build tools on top of it to do what they need (e.g., with their own hg repos).

  1. A serious issue with this patch is that it seems to already be totally broken. Just take the sage-4.8 tarball, extract it, and type "make text-expand". What happens is:
wstein@sage:/scratch/wstein/2012-01-22/sage-4.8$ time make text-expand
./spkg/base/text-expand                                               
Determined SAGE_ROOT=/scratch/wstein/2012-01-22/sage-4.8              
Extracting SPKG: ./spkg/standard/boost-cropped-1.34.1.spkg ...        
Extracting SPKG: ./spkg/standard/jinja2-2.5.5.spkg ...       
Extracting SPKG: ./spkg/standard/atlas-3.8.4.p1.spkg ...              
Extracting SPKG: ./spkg/standard/fortran-20100629.spkg ...            
Deconstructing repo at ./spkg/standard/cddlib-094f.p10 ...            
************************************************************************
You must compile Sage first using 'make' in the Sage root directory.    
(If you have already compiled Sage, you must set the SAGE_ROOT variable 
in the file './sage').                                                  
************************************************************************
./spkg/base/text-expand: line 23: ./json_bundle.py: No such file or directory
rm: cannot remove `./spkg/standard/cddlib-094f.p10/bundle.hg': No such file or directory
Deconstructing repo at ./spkg/standard/genus2reduction-0.3.p8 ...  
...
** and hundreds more similar error messages **

Maybe I'm doing something wrong, but there are no directions, and if one can't just do "make text-expand" from a bare tarball, then there should be an immediate error message to that effect.

  1. Looking at the source code, I'm concerned about whether or not every .hg repo (e.g., even the base repo) is actually changed to JSON when "make text-expand" is run. This is just a concern. I do note that after doing "make text-expand" the hg repo in $SAGE_ROOT is gone, with nothing in its place. Not happy about that (maybe this is caused by everything going to hell above):
wstein@sage:/scratch/wstein/2012-01-22/sage-4.8$ ls -al
total 128
drwxr-xr-x 4 wstein wstein  4096 2012-01-22 08:28 .
drwxr-xr-x 4 wstein wstein  4096 2012-01-22 08:38 ..
-rw-r--r-- 1 wstein wstein 71842 2012-01-19 21:38 COPYING.txt
-rw-r--r-- 1 wstein wstein   366 2012-01-19 21:38 .hgignore
-rw-r--r-- 1 wstein wstein  1949 2012-01-19 21:38 .hgtags
drwxr-xr-x 2 wstein wstein  4096 2012-01-19 21:38 ipython
-rw-r--r-- 1 wstein wstein  4399 2012-01-19 21:38 Makefile
-rw-r--r-- 1 wstein wstein 11673 2012-01-19 21:38 README.txt
-rwxr-xr-x 1 wstein wstein  5046 2012-01-19 21:38 sage
drwxr-xr-x 4 wstein wstein  4096 2012-01-19 21:38 spkg

Anyway, needs work.

Component: distribution

Reviewer: Marc Mezzarobba

Issue created by migration from https://trac.sagemath.org/ticket/12342

@jhpalmieri
Copy link
Member

comment:1

Let me add to the list: I don't think these commands are documented anywhere (except in comments at the top of the scripts). They might be mentioned in the top-level README, and they should definitely be in the installation guide.

@williamstein
Copy link
Contributor Author

comment:2

The first thing I find is that this code is really bad

if __name__ == '__main__':
    try:                  
        import argparse   
    except:               
        alternative_cmdline_handler()

I'm sitting here forever trying to figure out how to use the code based on reading the parser stuff (since there's no guide or other documentation about what to do and the "make "), and finally I discover a naked try/except and that there is an alternative parser, which has completely different behavior. Confusing. In fact, argparse isn't even in sage-4.8 yet (it should be in 5.0).

@williamstein
Copy link
Contributor Author

comment:3

I'm going to do an update of sort to #3052 for #12342 -- at least I'm writing a bunch of relevant code now. Maybe you'll use it or at least referee my patch. Nobody but needs to worry about dealing with any of the issues at #12342 now.

@williamstein
Copy link
Contributor Author

Attachment: hg_json.gz

@kini
Copy link
Collaborator

kini commented Jan 23, 2012

comment:4

Good morning and happy Chinese New Year from Singapore! Thanks for making this followup ticket, as my code could certainly be improved.

The main reason I didn't explain the command line usage of json_bundle.py anywhere is that it was not really meant to be used by the user - the interface to this code was to have been through make text-expand and make text-collapse. The only reason there are command line arguments at all is that I thought someone might want to expand on the functionality of the .py file later. But certainly you're right that putting an explanation in anyway helps anyone who is trying to debug or modify the code, and it improves modularity do so as well.

Looking at your new code, minor comment about exceptions:

# this is legacy syntax which is removed in Python 3
raise ExceptionType, "message"
except ExceptionType, e
# this is the currently recommended syntax
raise ExceptionType("message")
except ExceptionType as e

I see a lot of the former syntax in Sage. The Python website tells me that "well-written 2.x code can be a lot like 3.x code", meaning, I guess, that much of the Python 3 syntax is already standard in 2.6 and 2.7 and we should use it. So I usually try to do this, but maybe it's too early to start planning for Python 3 in Sage? After all we only just got 2.7 in in Sage 5.0.

Anyway, you said you're going to work on this yourself, but if you want me to help with anything just let me know. I'll be keeping an eye on this ticket.

@williamstein
Copy link
Contributor Author

comment:5

Hi,

I wrote this script hg_json, which does something very useful standalone -- it swaps between having a .hg directory and having a .hg.json file. The hard work is all done by the code you had already written.

Thoughts?

I hope not to work much more on this ticket, and that you (Kini) could somehow finish it. We need to come up with a workflow. Maybe can have an option to sdist, e.g.,

sage -sdist --paranoid

which replaces all .hg repos by .hg.json files... for now (maybe removes binaries, e.g., that gfortran spkg), and when someday we switch to git, it will do the analogous thing. Then Sage's own build system will be smart enough to deal with the .hg.json files, so that a tarball built using --paranoid will build fine with no further changes. Now, that would be nice, right?

@kini
Copy link
Collaborator

kini commented Feb 4, 2012

comment:6

git has a beautiful command called fast-export which does exactly what we want, and is a lot better and more robust than my code. git fast-export --all > filename will produce a text file containing all commit data ("commits"), file data ("blobs"), filename data ("trees"), pointers ("refs"), and pointer movement history ("the reflog") in a human-readable format. git fast-import < filename performed on an empty repository will reproduce the original repository.

fast-export is more thorough than my code, which only exports the first three of the five things I listed above, and loses the other two - not that we are using refs ("bookmarks" in Mercurial) or reflogs (not existent in Mercurial afaik) to do anything in Sage at the moment. Also, while my code just attempts to make human-readable the deltas between changesets, fast-export actually dumps every separate version of every file in the repository, which is I guess more "readable" (though this also results in much larger output). This is actually the natural way to do things in git, because git's most abstract conception of the repository actually does contain every separate version of every file, whereas Mercurial's level of abstraction is lower and stops at deltas between changesets.

If you plan on switching to git anytime soon, I recommend we just drop this ticket for now and work on getting fast-export into our build scripts instead. We already have a git-based Sage library repository; I can make analogous ones for the scripts, extcode, and root repos, though it's probably a good idea to have a long hard think about how we can consolidate our repos as much as possible, as the switch from hg to git would be the ideal time to do massive history rewriting (which git is good at, by the way, using git filter-branch). The SPKG system in particular needs a rework, IMO. We should have one git repo containing all the patches, spkg-install scripts, SPKG.txt files, etc. for all the packages we offer, standard/optional/experimental/whatever, and then just tar -cj the src/ directories separately. I think we should take a look at what Burcin is doing with lmonade for clues.

I timed fast-export and fast-import of the git repository I'm maintaining on github, as well as compression and decompression into gz, bz2, xz, and lrz archives (and the resulting filesizes):

fs@boone ~/src $ bash -x fast-export-test.sh 
+ cd sagelib
+ git fast-export --all

real	0m35.897s
user	0m33.348s
sys	0m1.494s
+ cd ..
+ du -h sagelib.fast_export
1.8G	sagelib.fast_export
+ gzip sagelib.fast_export

real	0m52.549s
user	0m50.681s
sys	0m0.779s
+ du -h sagelib.fast_export.gz
392M	sagelib.fast_export.gz
+ gunzip sagelib.fast_export

real	0m13.212s
user	0m9.598s
sys	0m0.882s
+ bzip2 sagelib.fast_export

real	2m29.877s
user	2m28.340s
sys	0m0.858s
+ du -h sagelib.fast_export.bz2
264M	sagelib.fast_export.bz2
+ bunzip2 sagelib.fast_export.bz2

real	0m53.294s
user	0m50.876s
sys	0m1.372s
+ xz sagelib.fast_export

real	10m49.975s
user	10m46.808s
sys	0m1.329s
+ du -h sagelib.fast_export.xz
107M	sagelib.fast_export.xz
+ unxz sagelib.fast_export.xz

real	0m10.790s
user	0m8.476s
sys	0m0.968s
+ lrzip -D sagelib.fast_export
Output filename is: sagelib.fast_export.lrz
sagelib.fast_export - Compression Ratio: 131.246. Average Compression Speed: 26.304MB/s.
Total time: 00:01:09.06

real	1m9.066s
user	1m12.394s
sys	0m0.750s
+ du -h sagelib.fast_export.lrz
14M	sagelib.fast_export.lrz
+ lrunzip -D sagelib.fast_export.lrz
Output filename is: sagelib.fast_export
Decompressing...
100%    1815.25 /   1815.25 MB
Average DeCompression Speed: 181.500MB/s
Output filename is: sagelib.fast_export: [OK] - 1903430998 bytes                                
Total time: 00:00:13.12

real	0m13.172s
user	0m4.774s
sys	0m1.624s
+ git init sagelib2
Initialized empty Git repository in /home/fs/src/sagelib2/.git/
+ cd sagelib2
+ git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     125000
Total objects:       120380 (      9143 duplicates                  )
      blobs  :        41942 (         0 duplicates      32577 deltas of      39920 attempts)
      trees  :        61752 (      9143 duplicates      55847 deltas of      56256 attempts)
      commits:        16686 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:         459 (       494 loads     )
      marks:        1048576 (     58628 unique    )
      atoms:           3025
Memory total:          8094 KiB
       pools:          2235 KiB
     objects:          5859 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =     322098
pack_report: pack_mmap_calls          =      16687
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  435565281 /  435565281
---------------------------------------------------------------------


real	1m22.824s
user	1m20.089s
sys	0m1.329s

Tests performed on a VirtualBox VM running on a 4-core i5-2500K @ 4.5 GHz with 8 GB of RAM.

In summary: fast-export produces a 1.8 GB text file in about 30 seconds. This can be compressed into a 392 MB gz file in about a minute, a 264 MB bz2 file in about two and a half minutes, a 107 MB xz file in about eleven minutes, or a 14 MB lrz file in about a minute. These can be decompressed in about ten seconds, a minute, ten seconds, and ten seconds respectively. Importing the fast-export dump using fast-import takes about a minute and a half.

So obviously we should try to compress our "paranoid source archive" with lrzip if at all possible (it actually compresses down to less than half the size of the git repository itself!). We don't need to ship this with sage unless you want all users to be able to produce paranoid source archives. Users who want the source archive will of course need to install lrzip themselves.

@sagetrac-mariah
Copy link
Mannequin

sagetrac-mariah mannequin commented Feb 9, 2012

comment:7

The attachment [https://github.com/sagemath/sage/files/ticket12342/hg_json.gz does not work on iml-1.0.1.p14.spkg,
specifically in iml-1.0.1.p14/src/src after doing hg->json
and now trying to convert back, json->hg:

%hg_json .
Converting .hg.json plaintext file to .hg repo...
hg init "."
hg pull -R "." "./.hg.json.bundle"
pulling from ./.hg.json.bundle
requesting all changes
adding changesets
adding manifests
adding file changes
added 4 changesets with 34 changes to 25 files
(run 'hg update' to get a working copy)
hg update -R "."
abort: untracked file in working directory differs from file in requested revision: 'nullspace.c'
Traceback (most recent call last):
  File "hg_json", line 279, in <module>
    convert(args[0], destructive)
  File "hg_json", line 251, in convert
    convert_json_to_hg(path, destructive)
  File "hg_json", line 220, in convert_json_to_hg
    system('hg update -R "%s"'%path)
  File "hg_json", line 172, in system
    raise RuntimeError, "error running command '%s'"%cmd
RuntimeError: error running command 'hg update -R "."'
%

Or perhaps there is something wrong with .hg in
iml-1.0.1.p14.spkg?

@kini
Copy link
Collaborator

kini commented Mar 2, 2012

comment:8

William, can you comment on my suggestion to abandon this until we switch to git, and the reasoning behind my suggestion?

@williamstein
Copy link
Contributor Author

comment:9

Replying to @kini:

William, can you comment on my suggestion to abandon this until we switch to git,
and the reasoning behind my suggestion?

I'm fine with you abandoning this, but I hope it gets done. There is no telling when we will actually switch to git. I hope somebody does finish this ticket.

@jdemeyer jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014
@fchapoton
Copy link
Contributor

comment:12

Now we have switched to git..

@fchapoton fchapoton removed this from the sage-6.2 milestone Mar 4, 2014
@mezzarobba
Copy link
Member

Reviewer: Marc Mezzarobba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants