Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting an error on line 95 #1

Open
alias-noa opened this issue Dec 31, 2020 · 5 comments
Open

getting an error on line 95 #1

alias-noa opened this issue Dec 31, 2020 · 5 comments

Comments

@alias-noa
Copy link

Traceback (most recent call last):
File "C:/Users/Noa/PycharmProjects/sec_scraper_master/scraper.py", line 95, in
s = s + "/" + newLink
NameError: name 'newLink' is not defined

All I did was try to run it on TEVA instead of AAPL

@alias-noa
Copy link
Author

What is the proper way to run this over several stocks? I just changed line 44 so maybe that's why I'm getting this error.

@alias-noa
Copy link
Author

Actually how do I even run this thing? I thought I was supposed to run scraper.py....but I'm thinking that's not the correct way. There's on main.py so how do I run it?

@alias-noa
Copy link
Author

Tried running multi and got a ton of crazy errors...

@hmcguinn
Copy link
Owner

hmcguinn commented Dec 31, 2020

Hey @alias-noa! This repo hasn't exactly been in production-shape :) I've just worked around the errors and don't have them pushed I think. Would you be able to copy the errors you received? I'll clean up the repo and add another comment in a little bit.

Glad you found the repo useful enough to give it a shot!

@hmcguinn
Copy link
Owner

A little bit more detailed comment on usage:

The scraper is set up as a shell script-- the file I use to run it is /multiThreading/multi.py. Multi.py reads in a list of CIK files from /multiThreading/cik.csv. If you need something to map between CIKs and tickers you can find that here.

From there, the scraper searches through the filings for a company (viewable here). As of now, it is configured to only grab Form 3 and Form 4s (Initial Statement of Beneficial Ownership of Securities and Statement of Changes in Beneficial Ownership). That code can be found on lines 84-95 of /multiThreading/getList.py.

The code to actually grab info from the filings in XML form is in /multiThreading/runScraper.py. Currently, it's limited in what it grabs but can be configured easily to grab whatever you want from the filing. The scraper stores all the filings associated with a company in a pandas dataframe before writing it out to an excel file.

Hope that helps to shed a little bit more light on what the code does! It's not exactly the most readable thing,,, I'll get around to cleaning it up at some point hopefully.

I also went ahead and made a couple changes to the repo. It should work after a pull now.

Thanks for giving it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants