Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code organization #34

Open
andrejj opened this issue Nov 18, 2014 · 1 comment
Open

Code organization #34

andrejj opened this issue Nov 18, 2014 · 1 comment

Comments

@andrejj
Copy link

andrejj commented Nov 18, 2014

Hi, first I want to say thank you for sharing this crawler and for the work you put in it.

Here is our experience with it and thoughts for improvements. I would be happy to know if you agree and if you would like to get this implemented (we can contribute of course).

We have a repository of code, we use for doing lots of data processing using resque.
We tried to use cobweb within our repository and here are our issues:

  1. name conflicts, classes are declared on a global level. Classes declared in cobweb should be name-spaced in a module. Example: Cobweb::Stats
  2. Sinatra loaded by default. We run our code on multiple machines with multiple processes. As I understand sinatra's purpose is to provide a UI for stats. We don't need/want it to be loaded every time on all boxes consuming memory and slowing down the boot time of our app. So this should be optional (example: 'require cobweb-web' or separate gem).
  3. files directive in gemspec. Everything you put in the files directive, can be loaded automatically. This again exposes naming conflicts. For example we use Fozzie that declares Stats module. But when you do 'require stats', you don't know which one is going to be loaded.
  4. sidekick vs resque, could be optional programmers decision and I would avoid auto detection
  5. logging should be configurable and puts statements should not be used. ruby Cobwbeb.logger = Logger.new

In conclusion this is what i have in mind:

require 'cobweb-resque'
# OR
require 'cobweb-sidekick'
require 'cobweb-web' # optional
Cobweb.logger = Logger.new("crawler.log")
@stewartmckee
Copy link
Owner

Hi Andej,

Agreed. Its kind of grown organically over time. Sounds like its due for a v2. :). Regarding your points,

  1. 110% agreed, should be refactored, there are a number of helper classes too that should just ne part of the main class too.
  2. Agreed on this one, was thinking of spliting to something similar to resque_web. Possibly bundling it as a seperate gem too because all the assets for the gui make the gem too big.
  3. Yes, namespacing and refactoring should solve this hopefully
  4. Yes, could do, thinking about it, explicit declaration of a queue system would be good, especially if your not using one, less dependancies.
  5. Hadn't thought about logging, good suggestion. The debug and quiet options can be removed and output for those can be mapped to a log level.

Cool, so all good suggestions. I've been hesitant to make breaking changes, but will start a v2 branch next time i get a chance. i'll give you a shout once it's there and if you've got any changes fire a pull request to it,

thanks,
Stewart.

On 18 Nov 2014, at 08:44, Andrej Jančič notifications@github.com wrote:

Hi, first I want to say thank you for sharing this crawler and for the work you put in it.

Here is our experience with it and thoughts for improvements. I would be happy to know if you agree and if you would like to get this implemented (we can contribute of course).

We have a repository of code, we use for doing lots of data processing using resque.
We tried to use cobweb within our repository and here are our issues:

name conflicts, classes are declared on a global level. Classes declared in cobweb should be name-spaced in a module. Example: Cobweb::Stats

Sinatra loaded by default. We run our code on multiple machines with multiple processes. As I understand sinatra's purpose is to provide a UI for stats. We don't need/want it to be loaded every time on all boxes consuming memory and slowing down the boot time of our app. So this should be optional (example: 'require cobweb-web' or separate gem).

files directive in gemspec. Everything you put in the files directive, can be loaded automatically. This again exposes naming conflicts. For example we use Fozzie that declares Stats module. But when you do 'require stats', you don't know which one is going to be loaded.

sidekick vs resque, could be optional programmers decision and I would avoid auto detection

logging should be configurable and puts statements should not be used. ruby Cobwbeb.logger = Logger.new

In conclusion this is what i have in mind:

require 'cobweb-resque'

OR

require 'cobweb-sidekick'
require 'cobweb-web' # optional
Cobweb.logger = Logger.new("crawler.log")

Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants