Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoscab - generalizing kelloggbot #56

Open
sneakers-the-rat opened this issue Dec 15, 2021 · 8 comments
Open

autoscab - generalizing kelloggbot #56

sneakers-the-rat opened this issue Dec 15, 2021 · 8 comments

Comments

@sneakers-the-rat
Copy link

love what you've done here. am rearchitecting to make easier to extend for other unions in the future with some prior webscraping code i've written. absolutely LOVE the resume generator. would love to work together, solidarity forever <3

@sneakers-the-rat
Copy link
Author

PS i got autoscab on pypi lmao lets do this

https://pypi.org/project/autoscab/
https://github.com/sneakers-the-rat/autoscab

@bolshoytoster
Copy link
Contributor

bolshoytoster commented Dec 15, 2021

We could take a dictionary of {xpath: action}, where xpath is a string and action is a class. There could be a base action class:

class Action():
    def __init__(self, fun):
    '''
        Create Action from function/lambda.
        It will be passed the element at xpath, so it must accept one argument.
    '''
        self.fun = lambda _, element: fun(element)

Then have two standard ones:

class Click(Action):
    fun = lambda _, element: element.click()
class Input(Action):
    def __init__(self, inputs):
    '''
        Inputs is either a string, an array of possible strings to choose or a function/lambda that returns the string to use
    '''
        if type(inputs) == list:
            self.fun = lambda _, element: element.send_keys(random.choice(inputs))
        elif type(inputs) == function:
            self.fun = lambda _, element: element.send_keys(inputs())
        else: # Assume it’s either a string or can be cast to a string
            self.fun = lambda _, element: element.send_keys(inputs)

Then have a function that takes the dictionary as input and carries out the actions, perhaps even in a loop:

def autoscab(actions, times=0):
'''
    actions is a dictionary:
        {
         xpath (string),

         action (Action or a class that inherits from it.)
        }

    times is the amount of times to run this, 0 for infinite
'''
    if times == 0:
        iterate = iter(int, 1)
    else:
        iterate = range(times)

    for _ in iterate:
        for xpath, action in actions.items():
            element = driver.find_element_by_xpath(xpath)
            action.fun(element)

This is a rough draft, it probably needs some more error handling but it’s probably best to let the user handle errors. It should work for most applications and to update it you mainly just have to change a dictionary.

@bolshoytoster
Copy link
Contributor

bolshoytoster commented Dec 15, 2021

I like the idea, but I'd be worried about that being limiting.

Fair enough, but if you just want to make a quick bot that's probably a good place to start, until people have time to make actual bots. Technically it could also work for purposes other than this but this is the main focus.

It might be better to componentize the project into a set of tools - resume_generator, captcha_solver, email_verifier, etc. - which can be imported into a selenium project and used in-place.

I think this would be a good idea, it also helps people quickly make bots, and it would also work with things other than selenium, if it's a bit too heavy for the particular application (~1mb last time I checked).

@sneakers-the-rat
Copy link
Author

yes! this is what i am doing ^^ will post here when i get a draft. Splitting into a bot that can take a set of selectors, an identity class that can do all the faking, and the resume generator with hooks for the identity class to use.

@sneakers-the-rat
Copy link
Author

I also think that trying to abstract the process further would take a bit of development time, i'm thinking of a programming interface that would be v familiar/usable by nonprogrammers (click thing, wait, type thing, wait, switch tabs, wait) and then we can do further abstraction depending on patterns that emerge

@pws1453
Copy link
Contributor

pws1453 commented Dec 16, 2021

Being able to extract the work done here into either a generalized program or build out some of the subroutines into external libraries would be extremely beneficial. Is there a specific way we'd want to do this?

@sneakers-the-rat
Copy link
Author

sneakers-the-rat commented Dec 16, 2021

Being able to extract the work done here into either a generalized program or build out some of the subroutines into external libraries would be extremely beneficial. Is there a specific way we'd want to do this?

Am working on a draft over here, though will need another day to get a full version, sorry to be cryptic: https://github.com/sneakers-the-rat/autoscab

edit have also fixed up the packaging and it's pypi-ready.

@sneakers-the-rat
Copy link
Author

OK autoscab 0.2.0 is up now. I'm totally fuzzyheaded right now, but basic organization

  • postbot:PostBot is the main driver, it spawns the chromedriver and has some syntactical sugar to let people interact with the page by just using self.<element> calls, as well as a logger, etc. It takes...
    • a starting url,
    • a locator_dict -- a dictionary of {'name': (By.<SELECTOR_TYPE>, '<SELECTOR>')} , you can see an example in constants/locators (that empty class just gets turned into a dict like dict(LocatorClass.__dict__)
    • an Identity (or it makes one if none is provided, see below)
  • Identity class creates all the identity elements, including resume et al. Also added in some browser fingerprinting randomization.

The basic pattern is to subclass PostBot with a series of actions to take to fill the form, using the locations in Locator, and then put them in PostBot.apply method -- you can see an example in deployments/fredmeyer . It's a little awkward right now, but trying to get it out to the people in time for it to be useful on my end.

A Deployment consists of a name, list of starting URLs, a Locator dictionary, and a subclasses PostBot. Any Deployment is picked up by the metaclass, so then the calling syntax is just

autoscab <DEPLOYMENT_NAME> 

I tried to leave in place a lot of what was here, but like i said am trying to get this out ASAP and figured we could cohere later. Also haven't pulled in any of y'all work.

usage: APPLY FOR MANY OF THE SAME JOB [-h] [-n N] [--relentless] [--list] [--noheadless] [--leaveopen]
                                      [deployment]

positional arguments:
  deployment    Which deployment to run

optional arguments:
  -h, --help    show this help message and exit
  -n N          Apply for n jobs (default: 1)
  --relentless  Keep applying forever
  --list        List all available deployments and exit
  --noheadless  Show the chromium driver as it fills in the application
  --leaveopen   Try to leave the browser open after an application is completed

IF THEY WANT SCABS, WE'LL GIVE EM SCABS

@sneakers-the-rat sneakers-the-rat changed the title this is sick autoscab - generalizing kelloggbot Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants