Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scraper for Yahoo Finance #3

Closed
mrhappyasthma opened this issue Feb 2, 2019 · 14 comments
Closed

Add a scraper for Yahoo Finance #3

mrhappyasthma opened this issue Feb 2, 2019 · 14 comments
Labels

Comments

@mrhappyasthma
Copy link
Owner

Particularly useful for the analysis. URL = https://finance.yahoo.com/quote/<symbol>/analysis.

Looking at the Next 5 Years (per annum).

@mrhappyasthma
Copy link
Owner Author

We can use xpath query to scrape.

//table[last()]//tr[last()-1]//td[2]

https://finance.yahoo.com/quote/AMZN/analysis?p=AMZN

@mrhappyasthma
Copy link
Owner Author

Some info on the yahoo finance API: https://observablehq.com/@stroked/yahoofinance

@mrhappyasthma
Copy link
Owner Author

mrhappyasthma commented Jan 19, 2021

regularMarketPrice and marketCap from query1.finance.yahoo.com/v7/finance/quote?fields=regularMarketPrice,marketCap&symbols=.

Also trailingAnnualDividendRate and dividendDate.

For company info and sec filings: sector, website, industry, longBusinessSummary, companyOfficers

https://query1.finance.yahoo.com/v10/finance/quoteSummary/MSFT?modules=assetProfile,secFilings

Other modules are:

modules = Array(26) [
  0: "assetProfile"
  1: "incomeStatementHistory"
  2: "incomeStatementHistoryQuarterly"
  3: "balanceSheetHistory"
  4: "balanceSheetHistoryQuarterly"
  5: "cashFlowStatementHistory"
  6: "cashFlowStatementHistoryQuarterly"
  7: "defaultKeyStatistics"
  8: "financialData"
  9: "calendarEvents"
  10: "secFilings"
  11: "recommendationTrend"
  12: "upgradeDowngradeHistory"
  13: "institutionOwnership"
  14: "fundOwnership"
  15: "majorDirectHolders"
  16: "majorHoldersBreakdown"
  17: "insiderTransactions"
  18: "insiderHolders"
  19: "netSharePurchaseActivity"
  20: "earnings"
  21: "earningsHistory"
  22: "earningsTrend"
  23: "industryTrend"
  24: "indexTrend"
  25: "sectorTrend"
]

Ex-dividend date comes from calendarEvents.

Cash on hand comes from balanceSheetHistory. - #22

@mrhappyasthma
Copy link
Owner Author

The only thing I can't figure out how to get yet (which I need) is Next 5 Years (per annum). This is used as part of the calculations to determine pricing.

@mrhappyasthma
Copy link
Owner Author

This comes down during the main response, so we can just URL fetch the analysis page.

The json is populated in the reactjs root.App.main=.

https://stackoverflow.com/a/39635322/1366973

@mrhappyasthma
Copy link
Owner Author

mrhappyasthma commented May 14, 2021

I'm not entirely sure why, but doing a local test works fine. But porting the code to run on the server is not finding the string in the output.

import lxml.html as html
from json import loads
import re
import requests

def isPercentage(text):
  match = re.match('(\d+(\.\d+)?%)', text)
  return match != None

def parseNextPercentage(iterator):
  node = None
  while node is None or not isPercentage(node.text):
    node = next(iterator)
  return node.text

r = requests.get('https://finance.yahoo.com/quote/FB/analysis?p=FB')
tree = html.fromstring(bytes(r.text, encoding='utf8'))
tree_iterator = tree.iter()
for element in tree_iterator:
  text = element.text
  if text == 'Next 5 Years (per annum)':
    print(parseNextPercentage(tree_iterator))

@mrhappyasthma
Copy link
Owner Author

Oh, it was a copy paste error. Of course :P

@mrhappyasthma
Copy link
Owner Author

Work mostly complete in 250485a.

@mrhappyasthma
Copy link
Owner Author

As of afa1844, the code is being used to calculate margin of safety.

I still need to parse the current price from the quote and display that.

@mrhappyasthma
Copy link
Owner Author

@mrhappyasthma
Copy link
Owner Author

Quote scraping added in 155a766.

The only thing that's needed (although I don't have an immediate use for it) is for fetching quoteSummary modules: #3 (comment)

@mrhappyasthma
Copy link
Owner Author

Started the implementation here: 5bf4a85

@mrhappyasthma
Copy link
Owner Author

It seems like the parsing can be done along the lines of this:

results= data['quoteSummary']['result']
moduleData = {}
for module in self.modules:
  for result in results:
    if module in result:
      moduleData[module] = result[module]
      break

This should produce a dictionary with keys for each module, and the results being the result.

@mrhappyasthma
Copy link
Owner Author

Reading data from a file, I confirmed this approach works:

import json

f = open("temp.txt", "r")
content = f.read()
data = json.loads(content)

results = data['quoteSummary']['result']
modules = ['assetProfile', 'secFilings', 'financialData']
moduleData = {}
for module in modules:
  for result in results:
    if module in result:
      moduleData[module] = result[module]
      break

for key, value in moduleData.items():
  print(key)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant