Question about parsing Urls dynamically. #32

hazelweakly · 2017-11-28T02:29:44Z

I'm writing a 404-checker for funsies and I'm currently wondering what the best way to go about this is. The program is given a list of websites and checks their sitemap, grabs every page of the site, and then gets the response code of every link on every page.

Now, this is hilariously inefficient, so the first thing to do is to put every single link on the website into a Set and filter out all anchor tags that aren't relative links and http/https links. With this, I have a problem; requests will be of type Url 'Https and Url 'Http as well.

Essentially, I can't figure out how to have type-safe urls without almost writing the entire program twice. Is there a way to do this? Or should I think about re-designing the program somehow (although I'm not sure how I could, save for just using raw bytestring everywhere somehow)

(PS: I like the correct by construction promise, but I've run into lots of small little paper-cuts with it and I've actually gotten farther by writing large pattern matching functions and manually building urls than I have by using the parseUrl functions in the library. I'm not quite sure what to do about that to make the Urls easier to use.)

The text was updated successfully, but these errors were encountered:

mrkkrp · 2017-11-28T03:40:37Z

I would do all transformations on ByteStrings. You could actually use the newer modern-uri package for stripping fragments/etc. Then you just parse/detect whether the scheme is http or https right before performing actual HTTP request.

There are two ways to do that:

If you go with modern-uri, you could just check the uriScheme component and dispatch on that.
You could try running both parseUrlHttp and parseUrlHttps on the same URL. Only one of them will succeed (return something in Just). Then you pattern-match on that and send the request.

The type of HTTP response is the same for both schemes, so after the request you can merge the control flow back.

Sorry for troubles, but this is just the flip side of the libraries with strong static guarantees. They become less-handy in more dynamic situations. Even things like Servant suffer from this.

mrkkrp · 2017-11-30T15:28:52Z

Did my answer help? Feel free to ask questions if something is not clear!

mrkkrp · 2017-12-03T15:23:40Z

Closing due to the lack of response. Feel free to re-open if necessary.

hazelweakly · 2017-12-03T21:03:20Z

Sorry for not replying! Your answer helped quite a lot, I just haven't had a lot of spare time lately to tinker on things (It's Finals week this week). It's been fun understanding better the balance between the strength of types and allowing enough freedom to do what you want. If you're curious, you can see the (super ugly, rough draft) of what I have so far in my urlchecker repo. It's in quite a state of disarray right now since I was trying a few things out and never cleaned it back up, so fair warning :p

mrkkrp added the question label Nov 28, 2017

mrkkrp closed this as completed Dec 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about parsing Urls dynamically. #32

Question about parsing Urls dynamically. #32

hazelweakly commented Nov 28, 2017

mrkkrp commented Nov 28, 2017

mrkkrp commented Nov 30, 2017

mrkkrp commented Dec 3, 2017

hazelweakly commented Dec 3, 2017

Question about parsing Urls dynamically. #32

Question about parsing Urls dynamically. #32

Comments

hazelweakly commented Nov 28, 2017

mrkkrp commented Nov 28, 2017

mrkkrp commented Nov 30, 2017

mrkkrp commented Dec 3, 2017

hazelweakly commented Dec 3, 2017