Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Overview

This is an experiment to scrape the Transports Quebec infrastructure database and save the data in different, easily usable formats.

Please see this blog post for the technical details or this blog post for the story and output files in CSV, JSON, LineJSON, XML and KML format.

Requirements (Tested On)

Not tested with newer versions of the above. YMMV.

NOTE

I currently have not plan to "support" this project. However, if you find and fix issues (e.g. stuff that does not work anymore because the HTML being scraped has been changed) or add features, feel free to send me pull requests.

If you find an issue that yourself have no plan to fix, feel free to open a ticket to let me know. Maybe by that time I will have found a portal to another dimension where I have extra time or a clone that would allow me to work on it.

Cheers!

UPDATE 2013/10/18

Roberto Rocca @robroc from The Gazette asked me if I had any recent scrape from the MTQ database.

I had not looked at this code in a long time and I was curious to see if it still worked. It did not.

However, by doing some tests in the Scrapy shell and checking HTML source code, I realized little would be necessary to fix things. So I found some time to update the scraper to have it work on the current MTQ website. Mostly, I had to change the base URL, the table IDs and XPath selector to get the structure photo URL.

NOTE: I did not test the code with the latest and greatest Scrapy version. Instead, to save myself trouble, I went with one of the oldest available version on PyPI (0.14.4) which did not require any change in my code.

NOTE 2: The latest version of the MTQ website uses cookies to track session. To easily break and inspect past the initial form submission, use inspect_response in parse_main_list or parse_details.

About

Transports Quebec Infrastructure Database Scraper

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.