# Analyst Forecasts

The objective of this notebook is to scrape analyst forecasts daily from multiple sources and understand what they mean and why they mean it. 

Once we have enough data, we can see which forecasters are good and which aren't, and how much an analysts forecast change can affect the market in the next 2 weeks or so.

Maybe we should separate our analysis based on market cap / maturity of companies.

How much do forecasters disagree?

#### Forecasts
* CNN Forecasts - Fix some minor mistakes
* WallstreetZen
* MarketBeat
* TipRank
* NewsQuantified - https://www.newsquantified.com/aapl/analysts
* Benzinga - https://www.benzinga.com/stock/aapl/ratings

#### Insiders
* Morningstar

#### Open Interest
* https://www.barchart.com/stocks/quotes/PLTR/options

# TODO

Scrape all forecasts before doing any further analysis.<br>
CNN forecasters suck. Maybe they're good at some particular sector..?

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re
from wayback import WaybackClient
import datetime
from tqdm.notebook import tqdm
wbclient = WaybackClient()

In [2]:
nasdaq_url = 'https://www.nasdaq.com/market-activity/stocks/screener'

In [16]:
stocks = pd.read_csv('nasdaq_screener_1607314349083.csv')
stocks = stocks[stocks['Market Cap'] > 1e+8] # 100m

In [4]:
BACKPOPULATE = False

In [17]:
companies = ['NIO']
companies = list(stocks['Symbol'].unique())
companies = [x for x in companies if '.' not in x]

In [20]:
len(companies)

4694

# CNN Forecasts

In [None]:
# Fix the numbers that are fucked.

In [15]:
urlroot = 'https://money.cnn.com/quote/forecast/forecast.html?symb='

In [288]:
def extract_forecast_cnn(url, comp):
    if type(url) == str:
        timestamp = datetime.datetime.now()
    else:
        timestamp = pd.to_datetime(url.timestamp).tz_localize(None)
        url = url.view_url
    try:
        page = requests.get(url)
        soup = BeautifulSoup(page.text, features='lxml')
        forecast_text = [x.text for x in soup.find_all('p') if 
                         re.search('The [0-9]* analysts offering 12-month price forecasts', x.text)][0]
        forecast_nums = re.findall('[0-9.%+-]+', forecast_text)
        forecast_info = {
            'company': comp,
            'timestamp': timestamp,
            'num_analysts': forecast_nums[0],
            'last_px': forecast_nums[6].strip('.'),
            'median': forecast_nums[2],
            'high': forecast_nums[3],
            'low': forecast_nums[4].strip('.'),
            'increase_from_last': forecast_nums[5],
        }
        try:
            forecast_info['increase_to_low'] = ((float(forecast_info['low']) - float(forecast_info['last_px'])) / 
                                                    float(forecast_info['last_px']))
            forecast_info['increase_to_low'] = np.round(forecast_info['increase_to_low']*100, 2)
        except:
            forecast_info['increase_to_low'] = None
        return forecast_info
    except IndexError:
        print(comp, f'has some index error on {timestamp}')
        return None

def get_forecasts_cnn(comp):
    url = urlroot + comp
    forecasts = []
    if BACKPOPULATE:
        try:
            urls = wbclient.search(url)
        except Error as e:
            print(e)
            return []
        for hist_url in [x for x in urls]:
            extraction = extract_forecast_cnn(hist_url, comp)
            if extraction:
                forecasts.append(extraction)
    forecasts.append(extract_forecast_cnn(url, comp))
    return forecasts

In [295]:
url = urlroot + 'XPEV'
if type(url) == str:
    timestamp = datetime.datetime.now()
else:
    timestamp = pd.to_datetime(url.timestamp).tz_localize(None)
    url = url.view_url
page = requests.get(url)
soup = BeautifulSoup(page.text, features='lxml')
forecast_text = [x.text for x in soup.find_all('p') if 
                 re.search('The [0-9]* analysts offering 12-month price forecasts', x.text)][0]
forecast_nums = re.findall('[0-9.%+-]+', forecast_text)

In [307]:
soup.time

In [284]:
from multiprocessing import Pool

In [None]:
flatten = lambda t: [item for sublist in t for item in sublist]

with Pool() as pool:
    forecasts = tqdm(pool.imap_unordered(get_forecasts_cnn, companies), 
                              total=len(companies))
    
    forecasts = flatten([x for x in forecasts if x])
    forecasts = pd.DataFrame(forecasts)

In [None]:
clean = forecasts[forecasts['increase_from_last'].str.contains('%')]
clean.loc[:,'increase'] = clean.loc[:,'increase_from_last'].str.strip('%+').astype(float)
clean = clean.drop('timestamp', axis=1)
clean.sort_values('increase', ascending=False, inplace=True)
clean.rename({"company":"Symbol"}, axis=1, inplace=True)
clean = pd.merge(clean, stocks, how='left', on='Symbol')

# Let's say all market caps > 1B
clean = clean[clean['num_analysts'].astype(int) > 5]
clean.loc[:,'diff'] = ((clean.loc[:,'low'].astype(float) - clean.loc[:,'last_px'].astype(float)) / 
                        clean.loc[:,'last_px'].astype(float))

clean = clean.sort_values(by='diff', ascending=False)

In [220]:
cn = clean[clean.Country=="United States"]
cn = cn[~cn.Name.str.contains("epositary")]

In [287]:
cn

Unnamed: 0,Symbol,num_analysts,last_px,median,high,low,increase_from_last,increase,Name,Last Sale,Net Change,% Change,Market Cap,Country,IPO Year,Volume,Sector,Industry,diff
28,HUYA,13,19.00,169.47,198.60,145.64,+791.96%,791.96,HUYA Inc. American depositary shares each rep...,$19.00,-0.39,-2.011%,4.470653e+09,United States,2018.0,3888843,Technology,Computer Software: Programming Data Processing,6.665263
515,ALEC,8,14.55,31.50,44.00,28.00,+116.49%,116.49,Alector Inc. Common Stock,$14.55,0.89,6.515%,1.152981e+09,United States,2019.0,539444,Health Care,Biotechnology: Biological Products (No Diagnos...,0.924399
535,LGND,6,86.55,182.50,229.00,146.00,+110.86%,110.86,Ligand Pharmaceuticals Incorporated Common Stock,$86.56,1.18,1.382%,1.391999e+09,United States,,188610,Health Care,Major Pharmaceuticals,0.686886
606,ADVM,7,13.72,27.00,36.00,21.00,+96.79%,96.79,Adverum Biotechnologies Inc. Common Stock,$13.71,-0.55,-3.857%,1.336418e+09,United States,2014.0,608000,Health Care,Biotechnology: Biological Products (No Diagnos...,0.530612
790,RETA,7,141.12,234.00,320.00,212.00,+65.82%,65.82,Reata Pharmaceuticals Inc. Class A Common Stock,$141.15,1.41,1.009%,4.815438e+09,United States,2016.0,395928,Health Care,Major Pharmaceuticals,0.502268
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4109,OCUL,6,20.35,15.50,25.00,3.00,-23.83%,-23.83,Ocular Therapeutix Inc. Common Stock,$20.34,1.34,7.053%,1.452101e+09,United States,2014.0,1497258,Health Care,Major Pharmaceuticals,-0.852580
3021,AMD,32,94.02,95.00,120.00,13.00,+1.04%,1.04,Advanced Micro Devices Inc. Common Stock,$94.04,1.73,1.874%,1.131030e+11,United States,,45532244,Technology,Semiconductors,-0.861732
4285,GME,9,16.91,6.00,19.00,1.60,-64.52%,-64.52,GameStop Corporation Common Stock,$16.90,0.78,4.839%,1.101231e+09,United States,2002.0,8957737,Consumer Services,Electronics Distribution,-0.905381
4161,TSLA,33,599.00,430.00,800.00,40.00,-28.21%,-28.21,Tesla Inc. Common Stock,$599.04,5.66,0.954%,5.678305e+11,United States,2010.0,29231475,Capital Goods,Auto Manufacturing,-0.933222


In [289]:
BACKPOPULATE = True
forecasts = []
for comp in tqdm(['HUYA', 'ALEC', 'LGND', 'ADVM', 'GME']):
    forecasts.extend(get_forecasts_cnn(comp))
forecasts = pd.DataFrame(forecasts)

HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))




## MarketBeat

Different things we can scrape: 
* Summary of ratings
* Per broker analysis
* Range of Earnings Estimates 
* Buying and Selling: https://www.marketbeat.com/stocks/NASDAQ/AAPL/institutional-ownership/

In [42]:
mktbturl = lambda comp: f'https://www.marketbeat.com/stocks/NASDAQ/{comp.upper()}/price-target/?MostRecent=0'

In [23]:
def extract_forecast_cnn(url, comp):
    if type(url) == str:
        page = requests.get(url)
        timestamp = datetime.datetime.now()
    else:
        page = requests.get(url.view_url)
        timestamp = pd.to_datetime(url.timestamp).tz_localize(None)
    soup = BeautifulSoup(page.text, features='lxml')
    forecast_text = [x.text for x in soup.find_all('p') if 
                     re.search('The [0-9]* analysts offering 12-month price forecasts', x.text)][0]
    forecast_nums = re.findall('[0-9.%+-]+', forecast_text)
    forecast_info = {
        'company': comp,
        'timestamp': timestamp,
        'num_analysts': forecast_nums[0],
        'median': forecast_nums[2],
        'high': forecast_nums[3],
        'low': forecast_nums[4].strip('.'),
        'increase_from_last': forecast_nums[5],
        'last_px': forecast_nums[6].strip('.')
    }
    return forecast_info

In [41]:
url

'https://www.marketbeat.com/stocks/NYSE/AAPL/price-target/?MostRecent=0'

In [43]:
url = mktbturl('AAPL')
page = requests.get(url)
soup = BeautifulSoup(page.text, features='lxml')

for x in soup.find_all('tbody')[1]:
    print(x.text, '\n')

In [76]:
[print(x.text) for x in x.find_all("td")];

12/7/2017
KeyCorp
Reiterated Rating
Buy
$48.00
Low



In [113]:
l = []
columns = [x.text for x in soup.find_all('table')[0].find_all('th')]
for tr in soup.find_all('table')[0].find_all('tr'):
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)
pd.DataFrame(l, columns=columns)

Unnamed: 0,Unnamed: 1,Today,30 Days Ago,90 Days Ago,180 Days Ago
0,,,,,
1,AAPL Consensus Rating:,Buy,Buy,Hold,Buy
2,AAPL Consensus Rating Score:,2.56,2.53,2.49,2.58
3,AAPL Analyst Ratings:,3 Sell Rating(s)14 Hold Rating(s)28 Buy Rating...,3 Sell Rating(s)15 Hold Rating(s)27 Buy Rating...,4 Sell Rating(s)15 Hold Rating(s)26 Buy Rating...,3 Sell Rating(s)12 Hold Rating(s)28 Buy Rating...
4,AAPL Consensus Price Target:,$112.94,$111.48,$100.74,$76.70
5,AAPL Price Target Upside:,8.74% downside,2.84% downside,6.75% downside,2.73% upside


In [82]:
sm = soup.find_all('tbody')[0]
for 

In [86]:
sm.find_all('tr')

[<tr><td style="vertical-align:top"><strong>AAPL Consensus Rating: </strong></td><td>Buy</td><td>Buy</td><td>Hold</td><td>Buy</td></tr>,
 <tr><td style="vertical-align:top"><strong>AAPL Consensus Rating Score: </strong></td><td>2.56</td><td>2.53</td><td>2.49</td><td>2.58</td></tr>,
 <tr><td style="vertical-align:top"><strong>AAPL Analyst Ratings: </strong></td><td>3 Sell Rating(s)<br/>14 Hold Rating(s)<br/>28 Buy Rating(s)<br/>0 Strong Buy Rating(s)</td><td>3 Sell Rating(s)<br/>15 Hold Rating(s)<br/>27 Buy Rating(s)<br/>0 Strong Buy Rating(s)</td><td>4 Sell Rating(s)<br/>15 Hold Rating(s)<br/>26 Buy Rating(s)<br/>0 Strong Buy Rating(s)</td><td>3 Sell Rating(s)<br/>12 Hold Rating(s)<br/>28 Buy Rating(s)<br/>0 Strong Buy Rating(s)</td></tr>,
 <tr><td style="vertical-align:top"><strong>AAPL Consensus Price Target: </strong></td><td>$112.94</td><td>$111.48</td><td>$100.74</td><td>$76.70</td></tr>,
 <tr><td style="vertical-align:top"><strong>AAPL Price Target Upside: </strong></td><td>8.74%

In [44]:
for p in soup.find_all('p')[4:]:
    print(p, '\n')

<p>Sign up for MarketBeat All Access to gain access to MarketBeat's full suite of research tools:</p> 

<p>View the latest news, buy/sell ratings, SEC filings and insider transactions for your stocks. Compare your portfolio performance to leading indices and get personalized stock ideas based on your portfolio.</p> 

<p>Get daily stock ideas top-performing Wall Street analysts. Get short term trading ideas from the MarketBeat Idea Engine. View which stocks are hot on social media with MarketBeat's trending stocks report.</p> 

<p>Identify stocks that meet your criteria using seven unique stock screeners. See what's happening in the market right now with MarketBeat's real-time news feed. Export data to Excel for your own analysis. </p> 

<p>Learn everything you need to know about successful options trading with this three-part video course.</p> 

<p>Looking for new stock ideas? Want to see which stocks are moving? View our full suite of financial calendars and market data tables, all fo

In [28]:
url

'https://www.marketbeat.com/stocks/NYSE/AAPL/price-target/?MostRecent=0'

In [36]:
urls

<generator object WaybackClient.search at 0x7f9fd0d46450>

## TipRank

TipRank is rendered with JavaScript, so we must pull it with [Splash](https://github.com/scrapinghub/splash) or [ScrappySplash](https://github.com/scrapy-plugins/scrapy-splash)

In [38]:
tiprankurl = lambda comp: f'https://www.tipranks.com/stocks/{comp}/forecast'

In [None]:
def extract_forecast_cnn(url, comp):
    if type(url) == str:
        page = requests.get(url)
        timestamp = datetime.datetime.now()
    else:
        page = requests.get(url.view_url)
        timestamp = pd.to_datetime(url.timestamp).tz_localize(None)
    soup = BeautifulSoup(page.text, features='lxml')
    forecast_text = [x.text for x in soup.find_all('p') if 
                     re.search('The [0-9]* analysts offering 12-month price forecasts', x.text)][0]
    forecast_nums = re.findall('[0-9.%+-]+', forecast_text)
    forecast_info = {
        'company': comp,
        'timestamp': timestamp,
        'num_analysts': forecast_nums[0],
        'median': forecast_nums[2],
        'high': forecast_nums[3],
        'low': forecast_nums[4].strip('.'),
        'increase_from_last': forecast_nums[5],
        'last_px': forecast_nums[6].strip('.')
    }
    return forecast_info

In [45]:
urls = wbclient.search(tiprankurl(comp))

In [46]:
urls = [x for x in urls]

In [58]:
urls[0].view_url

'http://web.archive.org/web/20200822161720/https://www.tipranks.com/stocks/nio/forecast'

In [72]:
page = requests.get(tiprankurl(comp))
timestamp = pd.to_datetime(urls[0].timestamp).tz_localize(None)
soup = BeautifulSoup(page.text, 'html.parser')

In [75]:
soup

<!DOCTYPE html>

<html>
<head style="display: none" typeof="og:article">
<meta content="IE=EDGE" http-equiv="X-UA-Compatible"/>
<meta content="tipranks://" property="al:ios:url"/>
<meta content="1237516490" property="al:ios:app_store_id"/>
<meta content="Smart Portfolio" property="al:ios:app_name"/>
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport"/>
<meta charset="utf-8"/>
<link href="/favicon.ico" rel="icon"/>
<link href="//trcdn.tipranks.com/website/e9e808882f53ee2f2e63.styles.css" rel="stylesheet"/>
<noscript>
<style>
      .hide-no-js {
        display: none;
      }

      .no-opacity-no-js {
        opacity: 1;
      }
    </style>
</noscript>
<base href="/"/>
<!-- Google Tag Manager -->
<noscript>
<iframe height="0" src="//www.googletagmanager.com/ns.html?id=GTM-WMX37F&amp;pageCategory=sports&amp;visitorType=returning" style="display:none;visibility:hidden" width="0"></iframe>
</noscript>
<script>
    window.setTimeout(fu

In [None]:
        for url in tqdm([x for x in urls]):
            forecasts.append(extract_forecast_cnn(url, comp))
    forecasts.append(extract_forecast_cnn(url, comp))

# WallstreetZen

https://www.wallstreetzen.com/stocks/us/nasdaq/aapl/stock-forecast

https://www.wallstreetzen.com/stocks/us/nyse/nio/stock-forecast

### MorningStar

* Short Interest Rate
* Dividends
* Ownership % Change Trends
* Executive Team -> Profile -> Recent Transactions 

* Ratings: http://screen.morningstar.com/StkStarRatings/StkRatingsPrevDayChangeDesc1.html

### For Screeners 

https://www.marketbeat.com/market-data/