<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Remoteok - Post daily jobs on slack
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Remoteok/Remoteok_Post_daily_jobs_on_slack.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #remoteok #jobs #slack #gsheet #naas_drivers

**Author:** [Sanjeet Attili](https://www.linkedin.com/in/sanjeet-attili-760bab190/)

## Input

### Import libraries

In [1]:
import pandas as pd
import requests
from datetime import datetime
import time
from naas_drivers import gsheet, slack
import naas

### Set the Scheduler

In [None]:
# naas.scheduler.add(recurrence="0 9 * * *")
# # naas.scheduler.delete() # Uncomment this line to delete your scheduler if needed

### Variables

In [2]:
REMOTEOK_API = "https://remoteok.com/api"
REMOTEOK_DATETIME = "%Y-%m-%dT%H:%M:%S"
NAAS_DATETIME = "%Y-%m-%d %H:%M:%S"

### Setup slack channel configuration

In [14]:
SLACK_TOKEN = "xoxb-1481042297777-3085654341191-xxxxxxxxxxxxxxxxxxxxxxxxx"
SLACK_CHANNEL = "05_jobs"

### Setup sheet log data

For the driver to fetch the contents of your google sheet, you need to share it with the service account linked with Naas first.
naas-share@naas-gsheets.iam.gserviceaccount.com

In [4]:
spreadsheet_id = "1EBefhkbmqaXMZLRCiafabf6xxxxxxxxxxxxxxxxxxx"
sheet_name = "remoteok_updated"

### Get the sheet log of jobs

In [12]:
try:
    df_jobs_log = gsheet.connect(spreadsheet_id).get(sheet_name=sheet_name)
except KeyError as e:
    print('Gsheet is empty!!')
    df_jobs_log = pd.DataFrame()

### Setup Remotive

### Setting the parameters 

In [6]:
categories = ['machine learning', 'data science', 'nlp', 'deep learning', 'computer vision', 'data','natural language processing', 'data engineer']
date_from  = -30 ### this is 30 days from now => must be negative

## Model

### Get jobs from RemoteOk

In [10]:
def get_jobs(remoteok_url, categories):
    df = pd.DataFrame()
    headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
        }
    index=0
    for tag in categories:
        url = remoteok_url + f"?tag={tag}"
        res = requests.get(url, headers=headers)
        try:
            res.raise_for_status()
        except requests.HTTPError as e:
            return e
        
        job_details = res.json()
        
        if len(job_details)==1:
            continue
        else:
            for idx, job in enumerate(job_details):
                if idx!=0:
                    date = job['date'].split('+')[0]
                    publication_time = datetime.strptime(date, REMOTEOK_DATETIME).timestamp()
                    required_time = time.time() + date_from* 24 * 60 * 60  ### time in seconds
                    
                    if publication_time >= required_time:
                        df.loc[index, 'Date'] = datetime.fromtimestamp(publication_time).strftime(NAAS_DATETIME)
                        df.loc[index,'Company'] = job.get('company')
                        df.loc[index,'Role'] = job.get('position')
                        df.loc[index, 'tags'] = ", ".join(job.get('tags'))
                        df.loc[index, 'location'] = job.get('location')
                        df.loc[index,'url'] = job.get('url')
                        index+=1
                        
    df = df.sort_values(by = 'Date', ascending=False)
    return df

df_jobs = get_jobs(REMOTEOK_API, categories)
df_jobs.head()

Unnamed: 0,Date,Company,Role,tags,location,url
1,2022-02-28 11:00:07,Hipcamp,Data Science Lead Marketplace,"data science, marketing, engineer, exec","San Francisco, CA",https://remoteOK.com/remote-jobs/109291-remote...
17,2022-02-28 11:00:07,Hipcamp,Data Science Lead Marketplace,"data science, marketing, engineer, exec","San Francisco, CA",https://remoteOK.com/remote-jobs/109291-remote...
18,2022-02-27 18:00:07,Kikoff,Data Analyst,"mobile, data science, marketing, engineer, bac...","Remote, United States",https://remoteOK.com/remote-jobs/109261-remote...
2,2022-02-27 18:00:07,Kikoff,Data Analyst,"mobile, data science, marketing, engineer, bac...","Remote, United States",https://remoteOK.com/remote-jobs/109261-remote...
0,2022-02-27 07:00:01,Generally Intelligent,Machine Learning Engineer,"machine learning, engineer",Remote-only,https://remoteOK.com/remote-jobs/109238-remote...


## Output

### Sending the data to sheets

In [13]:
def send_to_sheets(jobs_data):
    if len(df_jobs_log)!=0:
        job_urls = df_jobs_log.url.unique()
        ### Not appending already present jobs ###
        jobs_data = jobs_data[~jobs_data.url.isin(job_urls)]
        jobs_data = jobs_data.sort_values(by = 'Date', ascending=False)
    
    gsheet.connect(spreadsheet_id).send(sheet_name=sheet_name,
                            data=jobs_data,
                            append=True)
send_to_sheets(df_jobs)

### Send all job links to the slack channel

In [18]:
if len(df_jobs) > 0:
    for _, row in df_jobs.iterrows():
        url = row.url
        slack.connect(SLACK_TOKEN).send(SLACK_CHANNEL, f"<{url}>")
else:
    print("Nothing to be published in Slack !")

✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
✉️ Message sent
