New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve update performance (#799) #1341

Merged
merged 10 commits into from May 26, 2013

Conversation

Projects
None yet
3 participants
@thomasdziedzic
Copy link
Contributor

thomasdziedzic commented May 20, 2013

Remember the etag

@thomasdziedzic

This comment has been minimized.

Copy link
Contributor

thomasdziedzic commented May 20, 2013

Thinking about this issue offline, it might be wise to die later in the program instead of when we get the 304 response because another coder might be surprised by the behavior.

I don't think there is currently any issue with the code since it only gets used in downloadIndex and fetchRepoTarball. It's probably safe in both since if the index already exists, we do not want to extract it again. And fetchRepoTarball does checks to see if we already have the package so it will skip calling downloadURI.

Let me know what you think.

(2,0,0) -> Right (rspBody rsp)
(a,b,c) -> Left err
(2,0,0) -> Right rsp
(3,0,4) -> Left . ErrorMisc $ "Repo ETag matches. Nothing to do."

This comment has been minimized.

@23Skidoo

23Skidoo May 21, 2013

Member

Is this the message that will be shown to the user? Then it'd be better to reword it because the average user doesn't want to know what a "repo etag" is.

getHTTP :: Verbosity -> URI -> IO (Result (Response ByteString))
getHTTP verbosity uri = liftM (\(_, resp) -> Right resp) $
cabalBrowse verbosity (return ()) (request (mkRequest uri))
getHTTP :: Verbosity -> URI -> Maybe String -> IO (Result (Response ByteString))

This comment has been minimized.

@23Skidoo

23Skidoo May 21, 2013

Member

Please document what the etag parameter does in the haddock comment.


case result' of
Left err -> die $ "Failed to download " ++ show uri ++ " : " ++ show err
Right body -> do
Left err -> die $ show err

This comment has been minimized.

@23Skidoo

23Skidoo May 21, 2013

Member

I don't think that we should die when there's already an up-to-date version of the file. The program should be able to proceed as if the download has succeeded.

This comment has been minimized.

@thomasdziedzic

thomasdziedzic May 22, 2013

Contributor

That's a fair suggestion.
I implemented this behavior such that if our local file is up to date, I just continue the program as is.
One thing I still noticed is that cabal seems to still spend some time doing something, most likely extracting the 5.5MiB file we just skipped downloading. Do you think it would be desirable to skip the extraction if we skip downloading?

Here's an example of what I'm talking about.

[thomas-dziedzic@li430-81 cabal-install]$ time { ./dist/build/cabal/cabal update; }
Downloading the latest package list from hackage.haskell.org

real 0m7.150s
user 0m2.665s
sys 0m0.588s
[thomas-dziedzic@li430-81 cabal-install]$ time { ./dist/build/cabal/cabal update; }
Downloading the latest package list from hackage.haskell.org
Skipping download: Local and remote repositories match.

real 0m3.083s
user 0m2.384s
sys 0m0.485s

@23Skidoo

This comment has been minimized.

Copy link
Member

23Skidoo commented May 21, 2013

Thinking about this issue offline, it might be wise to die later in the program instead of when we get the 304 response because another coder might be surprised by the behavior.

This behaviour should at least be documented.

@23Skidoo

This comment has been minimized.

Copy link
Member

23Skidoo commented May 21, 2013

@gostrc Thanks for the patch. Overall looks good, please take a look at my comments.

@23Skidoo

This comment has been minimized.

Copy link
Member

23Skidoo commented May 22, 2013

One thing I still noticed is that cabal seems to still spend some time doing something, most likely extracting the 5.5MiB file we just skipped downloading. Do you think it would be desirable to skip the extraction if we skip downloading?

Sure, since the purpose of your patch is to improve the performance it makes sense to skip unnecessary processing. I suggest modifying the return type of downloadURI to make it possible for the client code to distinguish between the cases where the file has been updated and where we found out that there's an up-to-date version already.

err = ErrorMisc $ "Unsucessful HTTP code: "
++ concatMap show [a,b,c]

-- only write the etag if we get a 200 response code

This comment has been minimized.

@23Skidoo

23Skidoo May 22, 2013

Member

Sorry if it comes across as nitpicking, but please use proper punctuation and capitalisation (even though not all existing comments do).

@23Skidoo

This comment has been minimized.

I suggest using a sum type (i.e., something like DownloadResult = FileAlreadyInCache | FileDownloaded FilePath) instead. You can also use the same type in downloadURI.

This comment has been minimized.

Copy link
Owner

thomasdziedzic replied May 24, 2013

Thanks for the suggestion, I agree this would be better. I pushed a new commit to add a DownloadResult type.

Also I would like to thank you for taking your time in reviewing my code and offering suggestions.

@23Skidoo

This comment has been minimized.

Copy link
Member

23Skidoo commented May 25, 2013

@dcoutts @kosmikus

I'd like to merge this in. Please protest if you object.

@dcoutts

This comment has been minimized.

Copy link
Member

dcoutts commented May 26, 2013

Look fine, thanks @gostrc and @23Skidoo.

23Skidoo added a commit that referenced this pull request May 26, 2013

Merge pull request #1341 from gostrc/master
Improve update performance (#799)

@23Skidoo 23Skidoo merged commit 31baaa0 into haskell:master May 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment