Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about parsing Urls dynamically. #32

Closed
hazelweakly opened this issue Nov 28, 2017 · 4 comments
Closed

Question about parsing Urls dynamically. #32

hazelweakly opened this issue Nov 28, 2017 · 4 comments
Labels

Comments

@hazelweakly
Copy link

I'm writing a 404-checker for funsies and I'm currently wondering what the best way to go about this is. The program is given a list of websites and checks their sitemap, grabs every page of the site, and then gets the response code of every link on every page.

Now, this is hilariously inefficient, so the first thing to do is to put every single link on the website into a Set and filter out all anchor tags that aren't relative links and http/https links. With this, I have a problem; requests will be of type Url 'Https and Url 'Http as well.

Essentially, I can't figure out how to have type-safe urls without almost writing the entire program twice. Is there a way to do this? Or should I think about re-designing the program somehow (although I'm not sure how I could, save for just using raw bytestring everywhere somehow)

(PS: I like the correct by construction promise, but I've run into lots of small little paper-cuts with it and I've actually gotten farther by writing large pattern matching functions and manually building urls than I have by using the parseUrl functions in the library. I'm not quite sure what to do about that to make the Urls easier to use.)

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 28, 2017

I would do all transformations on ByteStrings. You could actually use the newer modern-uri package for stripping fragments/etc. Then you just parse/detect whether the scheme is http or https right before performing actual HTTP request.

There are two ways to do that:

  1. If you go with modern-uri, you could just check the uriScheme component and dispatch on that.
  2. You could try running both parseUrlHttp and parseUrlHttps on the same URL. Only one of them will succeed (return something in Just). Then you pattern-match on that and send the request.

The type of HTTP response is the same for both schemes, so after the request you can merge the control flow back.

Sorry for troubles, but this is just the flip side of the libraries with strong static guarantees. They become less-handy in more dynamic situations. Even things like Servant suffer from this.

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 30, 2017

Did my answer help? Feel free to ask questions if something is not clear!

@mrkkrp
Copy link
Owner

mrkkrp commented Dec 3, 2017

Closing due to the lack of response. Feel free to re-open if necessary.

@mrkkrp mrkkrp closed this as completed Dec 3, 2017
@hazelweakly
Copy link
Author

Sorry for not replying! Your answer helped quite a lot, I just haven't had a lot of spare time lately to tinker on things (It's Finals week this week). It's been fun understanding better the balance between the strength of types and allowing enough freedom to do what you want. If you're curious, you can see the (super ugly, rough draft) of what I have so far in my urlchecker repo. It's in quite a state of disarray right now since I was trying a few things out and never cleaned it back up, so fair warning :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants