Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawly custom settings #72

Merged
merged 4 commits into from Apr 6, 2020
Merged

Crawly custom settings #72

merged 4 commits into from Apr 6, 2020

Conversation

oltarasenko
Copy link
Collaborator

  1. Extend Crawly.Spider behaviour with otional custom_settings
  2. Add utility function to extract settings from a global config
    or spider's custom settings

Copy link
Collaborator

@Ziinc Ziinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the term custom_settings is not really intuitive of what it actually does.
Essentially, we are overriding certain default settings in config.exs for only this specific spider.
I think using override_settings might be better.

lib/crawly/utils.ex Outdated Show resolved Hide resolved
lib/crawly/spider.ex Outdated Show resolved Hide resolved
lib/crawly/spider.ex Outdated Show resolved Hide resolved
@Ziinc
Copy link
Collaborator

Ziinc commented Mar 31, 2020

I think we should also provide a macro to make simple overrides easy (for some syntactic sugar) , like in my example in the related issue, to get ride of some of the boilerplate. Though this should probably be split to a separate issue.

@Ziinc
Copy link
Collaborator

Ziinc commented Mar 31, 2020

Also add in some tests for the following situations:

  • user declares callback but doesn't return a specific option.
  • user declares callback that returns nothing or wrong type

@oltarasenko
Copy link
Collaborator Author

@Ziinc thanks for comments. I was not addressing them yet, as I was in the middle of more massive changes already. Will have a closer look soon!

@oltarasenko oltarasenko force-pushed the custom_settings branch 2 times, most recently from a6ecbf2 to 4373d23 Compare April 1, 2020 14:15
@oltarasenko
Copy link
Collaborator Author

@Ziinc can you please have another glance on it?

@oltarasenko oltarasenko changed the title WIP: Crawly custom settings Crawly custom settings Apr 1, 2020
@oltarasenko oltarasenko requested a review from Ziinc April 1, 2020 18:19
config/config.exs Outdated Show resolved Hide resolved
documentation/configuration.md Outdated Show resolved Hide resolved
documentation/configuration.md Outdated Show resolved Hide resolved
lib/crawly/manager.ex Outdated Show resolved Hide resolved
lib/crawly/manager.ex Outdated Show resolved Hide resolved
test/settings_test.exs Outdated Show resolved Hide resolved
end)
end

test "settings from the spider are overriding globals" do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an additional test where no global setting is defined, and no spider setting is defined, should throw an error-reason tuple.

This would be for developer's debugging convenience

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well currently Crawly is built with an idea that it should run even without settings. We have default fallback functions for most or all settings.

This idea can be discussed, but lets maybe do it in a separate issue/pr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, was thinking along the lines of when devs create their own pipelines/middlewares and want to define/get their own settings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, it still should be possible

defp get_spider_setting(_setting_name, nil), do: nil

defp get_spider_setting(setting_name, spider_name) do
case function_exported?(spider_name, :settings_override, 0) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pass the global config as first arg to the callback? I think it might be beneficial, for example, adjusting pipelines/middlewares on the fly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please show an example? In general, I don't understand the plan yet. However, if you can show samples I am happy to include it.

Otherwise, if it's private we can't override it, right?

@Ziinc
Copy link
Collaborator

Ziinc commented Apr 1, 2020

Overall looks great 👍 , just some minor changes. The docs on how to access settings in pipelines/middlewares will need to be updated too

1. Extend Crawly.Spider behaviour with otional custom_settings
2. Add utility function to extract settings from a global config
   or spider's custom settings
@oltarasenko
Copy link
Collaborator Author

@Ziinc I have addressed most (if not all) comments you have made. Could we make another round of the review please :)?

@oltarasenko oltarasenko requested a review from Ziinc April 1, 2020 20:14
documentation/configuration.md Show resolved Hide resolved
end
```

The full list of overridable settings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to list this? all settings are overridable, from my understanding.

end)
end

test "settings from the spider are overriding globals" do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, was thinking along the lines of when devs create their own pipelines/middlewares and want to define/get their own settings.

lib/crawly/manager.ex Outdated Show resolved Hide resolved
lib/crawly/manager.ex Outdated Show resolved Hide resolved

def get_settings(setting_name, spider_name \\ nil, default \\ nil) do
global_setting = Application.get_env(:crawly, setting_name, default)
case get_spider_setting(setting_name, spider_name) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

def get_settings(setting_name, spider_name \\ nil , defualt \\ nil) do
    global = Application.get_env(:crawly, setting_name, default)
    overrides = cond do
        function_exported?(spider_name, :override_settings, 1) when is_atom(spider_name) ->
            spider_name.override_settings(global)
        function_exported?(spider_name, :override_settings, 0) when is_atom(spider_name) ->
            spider_name.override_settings()
        true -> 
           []
    end
    merged = Keyword.merge(global, overrides)
    Keyword.get(merged, setting_name, default)
end

Allows for more advanced logic in the override function, and the user doesn't have to do a Application.get_env in the spider module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure about it. Lets maybe discuss it in a separate issue.

@oltarasenko oltarasenko requested a review from Ziinc April 2, 2020 15:51
@oltarasenko
Copy link
Collaborator Author

@Ziinc I have addressed new code review comments. Let's have another glance. I would suggest discussing all more complex capabilities of custom settings separately, as this PR is quite large already. Also maybe we can do it as a part of callbacks related discussion. (Maybe it's time when we will convert Spider to macro-based definition)

Copy link
Collaborator

@Ziinc Ziinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@oltarasenko oltarasenko merged commit 8e89eb9 into master Apr 6, 2020
@Ziinc
Copy link
Collaborator

Ziinc commented Apr 7, 2020

🎉🎉🎉🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants