Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace HTML parser with a new script #317

Merged
merged 1 commit into from May 20, 2011
Merged

Replace HTML parser with a new script #317

merged 1 commit into from May 20, 2011

Conversation

skurfer
Copy link
Member

@skurfer skurfer commented May 13, 2011

If an image has a title attribute, that should be the name of the object in the results. Quicksilver has always supported this, but the old perl script was using a module that didn’t parse out title or alt attributes on images.

This Python script replaces the perl script and returns a proper name back to QS for images. The name will be taken from the first item found on this list:

  • The image’s title attribute
  • The image’s alt attribute
  • The image’s src attribute

The script relies only on features available by default in Mac OS X’s Python.

I’ll admit that it seems a little slower at times, but I think that’s a network issue because it’s not consistent, and local files seem to be parsed very quickly.

@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

It definitely seems to be slower for me.

I tested on a page with a decent number of links ( http://macrumors.com/forums ) and with the py script I got a beachball and had to wait 3-4s whereas the pl script did it in about 1s.

EDIT: I was assuming QS doesn't cache sites, but the 2nd time I loaded the above link it was a lot quicker with the py script

Is the 'deciding on a name for images' part done in the script?
I still think you're right that the images should have better names, but I'm not sure if it justifies the time increase.

But then again, I don't use it and probably won't (that often)
I think @philostein uses this quite a bit (after seeing the blog posts) so he may be able to say which he'd prefer
a) Better labelling of images (with their title or alt preferred instead of the image URL)
b) Faster loading of the pages within QS, and the image or link URL could be used as the label

@skurfer
Copy link
Member Author

@skurfer skurfer commented May 20, 2011

I’m not so sure that it’s the script after some more testing.

For example, I saved the main MacRumors page as a local HTML file and arrowed into it. It had 430 links on it and took no time at all. (Maybe 1/10s or less.)

I was assuming QS doesn't cache sites, but the 2nd time I loaded the above link it was a lot quicker with the py script

I don’t know. I don’t think QS is caching the site. I was going to suggest that it was a Python first-time-starup delay, but I don’t see that either after restarting QS. I think it just has to do with network latency.

Is the 'deciding on a name for images' part done in the script?

Yes, but all the details are already there so it’s just a couple of if statements.

These scripts work by having STDIN piped to them, so I’ll do some tests outside of QS with a few HTML files and post the results.

@skurfer
Copy link
Member Author

@skurfer skurfer commented May 20, 2011

OK, here’s some data. I took two sites with lots of links: Wired and MacRumors. I saved the main page locally and piped that through each script, then I used curl to pull them down in real-time and pipe that to each script. The local tests were always exactly the same, so I didn’t average them. The times with network are an average over 10 runs.

Site Python Perl
wired (local) 0.08s 0.03s
macrumors (local) 0.06s 0.03s
wired (net) 0.331s 0.360s
macrumors (net) 0.188s 0.179s

So, the Perl script is faster in 3 of the 4 tests, but by a couple hundredths of a second. And when you involve the network, even that small advantage seems to go away.

@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

Twice as long locally, but I don't think you can complain about 0.08s ;)
In the overall scheme of things - when the http requests take 5 times longer
it's not really anything to worry about.

Looks good, and I'll merge unless there are any other complaints.

P.S. some merging of my stuff would be nice (if they're OK of course). I've
got about 4 or 5 more things (can you believe it?!) that I want to make
pulls for.

On 20 May 2011 23:12, skurfer <
reply@reply.github.com>wrote:

OK, heres some data. I took two sites with lots of links: Wired and
MacRumors. I saved the main page locally and piped that through each script,
then I used curl to pull them down in real-time and pipe that to each
script. The local tests were always exactly the same, so I didnt average
them. The times with network are an average over 10 runs.

Site Python Perl
wired (local) 0.08s 0.03s
macrumors (local) 0.06s 0.03s
wired (net) 0.331s 0.360s
macrumors (net) 0.188s 0.179s

Reply to this email directly or view it on GitHub:
#317 (comment)

@skurfer
Copy link
Member Author

@skurfer skurfer commented May 20, 2011

Yeah, I can go merge your web search icons at the very least. Just know that it’s going to create a merge conflict for this guy.

Git tangent

I’ve noticed that if I use TextMate’s “Edit Conflicts with FileMerge…” the next time I commit, it seems to know that I’ve resolved a conflict somehow and it commits only that. On the other hand, if I just edit the conflicted file and take out one side or the other, when I try to commit it acts like a normal commit and includes all changed files, not just the conflicted one. I’ll do some Googling for an explanation, but this might be helpful.

@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

After playing with it a bit more, I definitely think it's a lot better - image URLs was terrible in comparison!

@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

P.S. how about git's own git mergetool from Terminal? Seems to work OK for me :)

@pjrobertson pjrobertson merged commit 29d90bc into quicksilver:master May 20, 2011
@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

I had to do a manual edit anyways. Mergetool didn't include all the files though.

Looks good :)

1 similar comment
@pjrobertson
Copy link
Member

@pjrobertson pjrobertson commented May 20, 2011

I had to do a manual edit anyways. Mergetool didn't include all the files though.

Looks good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants