Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Staging to master to add github metrics #109

Merged
merged 136 commits into from Jul 4, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
e5b5af4
Added warmup and support for two-sequence classification.
hlums Jun 17, 2019
53cdba0
Added entailment notebook on XNLI
hlums Jun 17, 2019
7cba858
Updated docstring.
hlums Jun 18, 2019
e09a545
Resolved conflict and merged from staging.
hlums Jun 18, 2019
c9d5873
track github metrics
miguelgfierro Jun 18, 2019
7f1bb8a
bug fix: sts-benchmark has extra tabs in some rows which caused incor…
janhavi13 Jun 12, 2019
074bca3
black formatter
janhavi13 Jun 12, 2019
236a64e
update stsbenchmark :notebook:
Jun 18, 2019
f8a2db0
move all the imports to global settings
lishao Jun 19, 2019
066fd91
Merge pull request #108 from microsoft/liqun-fix
saidbleik Jun 19, 2019
4e68640
Merge pull request #107 from microsoft/github_metrics2
saidbleik Jun 19, 2019
5245afa
add dask data loader
saidbleik Jun 19, 2019
bab7499
Updated different readmes
heatherbshapiro Jun 19, 2019
d0b057a
updated azureml section
heatherbshapiro Jun 19, 2019
b407204
add sequential loader
saidbleik Jun 19, 2019
2354a0d
add dask dependency
saidbleik Jun 19, 2019
21e1749
Updated notebook with new data utils and added Hindi example
hlums Jun 19, 2019
946a687
Resolved confict with staging.
hlums Jun 19, 2019
593bb4e
Added convert_to_unicode helper function.
hlums Jun 19, 2019
ed3415b
Updated utils of XNLI dataset.
hlums Jun 19, 2019
4c4f91e
removed scenarios chart from root
heatherbshapiro Jun 20, 2019
4388e6c
add whole-word pretrained models
saidbleik Jun 20, 2019
aa35f62
Updated entailment notebook with results.
hlums Jun 20, 2019
76fa9d7
Fixed formatting.
hlums Jun 20, 2019
2010daf
Removed redundant code.
hlums Jun 20, 2019
fad2564
added optional prob dist predictions
saidbleik Jun 20, 2019
d53c17e
minor edit to preds
saidbleik Jun 20, 2019
c0951fa
rem data_loader
saidbleik Jun 20, 2019
e561dee
add the aml utility function that can get or create workspace as that…
catherine667 Jun 3, 2019
ed26f8f
Integrated Mlflow with AzureMl Gensen deep dive notebook
AbhiramE Jun 20, 2019
8140f67
Fixed documentation to get rid of AzureML logging
AbhiramE Jun 21, 2019
a640869
Merge pull request #112 from microsoft/heather-readme
saidbleik Jun 21, 2019
c53edc4
Added training and prediction time to notebook.
hlums Jun 21, 2019
b0bfcba
Added placeholder for token type ids.
hlums Jun 21, 2019
136cadf
lm name changes
saidbleik Jun 21, 2019
aec2ffb
add namedtuple preds output
saidbleik Jun 21, 2019
55ebf35
Merge pull request #115 from microsoft/sts-benchmark-fix
saidbleik Jun 21, 2019
8b56eec
updated defaults for predict's output
saidbleik Jun 21, 2019
65f1c82
arg name change
saidbleik Jun 21, 2019
399707e
meh
saidbleik Jun 21, 2019
fb3f7dd
Moved _truncate_seq_pair outside of if else block.
hlums Jun 21, 2019
ad3c885
change line length
saidbleik Jun 21, 2019
0bae4ff
add csv loader test
saidbleik Jun 21, 2019
ed4e09b
edits to loader test
saidbleik Jun 21, 2019
80487c8
Code changes based on code review comments.
AbhiramE Jun 21, 2019
9c3a951
add sequential loader test
saidbleik Jun 21, 2019
35fc04c
Merge pull request #113 from microsoft/hlu/two_sequence_utils_and_XNL…
saidbleik Jun 21, 2019
480a08f
Updated NER notebook with new tokenizer api.
hlums Jun 23, 2019
044af9e
Updated ner token preprocessing for Chinese text.
hlums Jun 23, 2019
7b827d6
Updated ner token preprocessing for Chinese text.
hlums Jun 23, 2019
7c35f67
Added probabilities output to BERT token classifier.
hlums Jun 23, 2019
aeb9486
Updated wikigold utils to be consistent with other datasets.
hlums Jun 23, 2019
f4aa615
Removed MSRA NER utils temporarily.
hlums Jun 23, 2019
6abb3e5
Merged with staging.
hlums Jun 23, 2019
291d250
Added NOTICE.txt file with huggingface BERT.
hlums Jun 23, 2019
6189915
Added notebook for Chinese NER.
hlums Jun 23, 2019
9d0d764
merge - token_type_ids & pred proba support
saidbleik Jun 24, 2019
ea5d1ac
Merge pull request #114 from microsoft/bleik-seq-classifier
saidbleik Jun 24, 2019
ab2672a
missing changes
saidbleik Jun 24, 2019
f02bb92
Merge pull request #122 from microsoft/bleik-seq-classifier
saidbleik Jun 24, 2019
6c90cea
Updated readme of NER scenario.
hlums Jun 24, 2019
e5d1492
fix broken link for gensen aml notebook in readme
lishao Jun 24, 2019
b995eae
Merge remote-tracking branch 'origin/staging' into hlu/ner_utils_updates
hlums Jun 24, 2019
48ce3bf
Changed tokenizer_preprocess_ner_text to tokenize_ner.
hlums Jun 24, 2019
c546d98
Merge branch 'hlu/ner_utils_updates' of https://github.com/Microsoft/…
hlums Jun 24, 2019
e364b7f
Merge branch 'hlu/ner_utils_updates', remote-tracking branch 'origin'…
hlums Jun 24, 2019
e562029
Merge pull request #121 from microsoft/hlu/ner_utils_updates
saidbleik Jun 24, 2019
7a8e4cf
Merge pull request #123 from microsoft/liqun-fix
saidbleik Jun 24, 2019
5d86d03
Updated pip version of AzureML Mlflow used in the Pytorch estimator
AbhiramE Jun 24, 2019
4dac5f1
Merge pull request #116 from microsoft/abhiram-mlflow
saidbleik Jun 25, 2019
d3c417f
Update notebook text.
hlums Jun 25, 2019
837a0e7
Fixed bug with join_character
hlums Jun 25, 2019
51c5f1f
Fixed _truncate_seq_pairs bug.
hlums Jun 25, 2019
fd0839b
Updated notebook descriptions.'
hlums Jun 25, 2019
ea49a11
Minor description update.
hlums Jun 25, 2019
5b5ee02
Minor description update.
hlums Jun 25, 2019
f2281d4
Merge pull request #130 from microsoft/hlu/minor_bug_fix
saidbleik Jun 26, 2019
a91b73a
added more tests and bug fixes
saidbleik Jun 26, 2019
84d141b
bug fix in notebook
saidbleik Jun 26, 2019
bd1bdf3
bug fix
saidbleik Jun 26, 2019
4a9633f
old files
saidbleik Jun 26, 2019
d5c61a1
Merge pull request #111 from microsoft/bleik-dataloader
saidbleik Jun 26, 2019
6e33f25
Merge pull request #120 from microsoft/flake8-defaults
saidbleik Jun 26, 2019
8df5780
Merge pull request #125 from microsoft/hlu/third_party_notice_for_hug…
saidbleik Jun 26, 2019
8d13bc0
Renamed conll preprocess function.
hlums Jun 26, 2019
6c75683
create unit-tests.yml as a copy from .ci/azure-pipelines.yml and crea…
bethz Jun 26, 2019
7e7f397
Merge pull request #133 from microsoft/hlu/ner_on_chinese
saidbleik Jun 26, 2019
3f84802
add comments to tests/readme
bethz Jun 26, 2019
d92695f
notebook edits
cocochrane Jun 26, 2019
4202063
autoML notebook with google universal sentence encoder features
cocochrane Jun 12, 2019
7b06d67
Fixed notebooks based on new sts benchmark data loader functions
cocochrane Jun 12, 2019
183f25b
AutoML notebook with google USE embeddings- added descriptions
cocochrane Jun 14, 2019
16b4374
automl model deployed using ACI
janhavi13 Jun 17, 2019
da1ccb7
fixed issue with autoenv.yaml file
janhavi13 Jun 17, 2019
b072df9
Clean notebook, clear workspace information, rerun, lowercase dataset
cocochrane Jun 18, 2019
a33a892
Update widget image
cocochrane Jun 18, 2019
c48443c
File paths working
cocochrane Jun 18, 2019
6288e25
Add AutoML pipelines notebook
cocochrane Jun 19, 2019
8aa53ab
Added in descriptions for pipelines, table of contents, etc.
cocochrane Jun 19, 2019
0fc9376
Fix image and clear long output cell
cocochrane Jun 19, 2019
26cfbbc
Added ACI deployment of both pipeline steps
cocochrane Jun 24, 2019
9b12d60
edits to automl pipelines notebook
cocochrane Jun 24, 2019
551fa93
edits to automl pipelines notebook
cocochrane Jun 24, 2019
cf1d13e
Changed notebook to use automl embeddings vs google USE
cocochrane Jun 24, 2019
574c77c
Notebooks PR ready
janhavi13 Jun 26, 2019
ef03948
Added missing automl_local_deployment_aci
janhavi13 Jun 26, 2019
1055ee9
text fix
janhavi13 Jun 26, 2019
85f74c5
fixed title
janhavi13 Jun 26, 2019
b3e85d4
fixed text issues
janhavi13 Jun 26, 2019
3759b8e
revert file version to staging version
janhavi13 Jun 26, 2019
84e5e04
Fixes based on Abi's PR comments
cocochrane Jun 26, 2019
5a529d8
Merge pull request #135 from microsoft/bezeran-unit-test
saidbleik Jun 26, 2019
f43ab86
renamed automl_with_pipelines to automl_with_pipelines_aks
janhavi13 Jun 26, 2019
6ee164d
Merge pull request #136 from microsoft/courtney-edit-notebooks
saidbleik Jun 26, 2019
ab2ce57
Fixed most PR comments
janhavi13 Jun 27, 2019
af1d7a6
black formatting and some text addition in deployment section
janhavi13 Jun 28, 2019
12043e1
black formatting and some text addition in deployment section
janhavi13 Jun 28, 2019
5de635c
[WIP]resolve review comments and black formatting
janhavi13 Jun 28, 2019
ec2c827
azure devops tests
miguelgfierro Jun 28, 2019
ce9c066
Address PR comments for local automl aci deployment
Jun 28, 2019
9b56315
Resolve PR comment on pipelines notebook
Jun 28, 2019
c5a4b05
BiDAF quickstart deployment notebook
Jun 28, 2019
db104db
Add text to notebook and solve deployment error
Jun 29, 2019
9b3ce07
Empty deep dive notebook
Jun 29, 2019
685e7ba
Update READMEs
Jun 30, 2019
950761f
Update readmes
Jun 30, 2019
2a5e902
Clean folder
Jun 30, 2019
f02d39f
Remove tensorflow import statement for notebook
Jun 30, 2019
5d53db0
Merge pull request #137 from microsoft/courtney-janhavi-automl
saidbleik Jul 1, 2019
3174ade
Merge pull request #138 from microsoft/test_devops
miguelgfierro Jul 2, 2019
e2235b6
PR edits
Jul 3, 2019
d7d9129
Merge pull request #139 from microsoft/courtney-bidaf
saidbleik Jul 3, 2019
b4405ce
:bug: in Readme
miguelgfierro Jul 3, 2019
5e21302
remove yahoo_answers utils
saidbleik Jul 3, 2019
88c7243
readme edits
saidbleik Jul 3, 2019
a2a696a
Merge pull request #144 from microsoft/bleik-cleanup
miguelgfierro Jul 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 1 addition & 4 deletions scenarios/sentence_similarity/gensen_aml_deep_dive.ipynb
Expand Up @@ -127,7 +127,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 1,
"metadata": {
"scrolled": true
},
Expand Down Expand Up @@ -887,8 +887,6 @@
}
],
"source": [
"import shutil\n",
"\n",
"gensen_folder = os.path.join(project_folder,'utils_nlp/gensen/')\n",
"shutil.copy('gensen_train.py', gensen_folder)\n",
"shutil.copy('gensen_config.json', gensen_folder)"
Expand Down Expand Up @@ -1030,7 +1028,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down
26 changes: 26 additions & 0 deletions tests/ci/repo_metrics_pipeline.yml
@@ -0,0 +1,26 @@

jobs:
- job: Repometrics
pool:
vmImage: 'ubuntu-16.04'

steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.6'
architecture: 'x64'

- script: |
cp tools/repo_metrics/config_template.py tools/repo_metrics/config.py
sed -i ''s/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/$(github_token)/g'' tools/repo_metrics/config.py
sed -i ''s/XXXXXXXXXXXXXXXXXXXXXXXXX/$(cosmosdb_connectionstring)/g'' tools/repo_metrics/config.py
displayName: Configure CosmosDB Connection

- script: |
python -m pip install python-dateutil>=2.80 pymongo>=3.8.0 gitpython>2.1.11 requests>=2.21.0
python tools/repo_metrics/track_metrics.py --github_repo "https://github.com/microsoft/nlp" --save_to_database
displayName: Python script to record stats




18 changes: 14 additions & 4 deletions tools/repo_metrics/README.md
@@ -1,6 +1,6 @@
# Repository Metrics

[![Build status](https://msdata.visualstudio.com/AlgorithmsAndDataScience/_apis/build/status/Recommenders/Recommenders%20repo%20stats)](https://msdata.visualstudio.com/AlgorithmsAndDataScience/_build/latest?definitionId=5206)
[![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/repo_metrics?branchName=master)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=36&branchName=master)

We developed a script that allows us to track the metrics of the Recommenders repo. Some of the metrics we can track are listed here:
miguelgfierro marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -10,17 +10,27 @@ We developed a script that allows us to track the metrics of the Recommenders re
* Number of views
* Number of lines of code

To see the full list of metrics, see [git_stats.py](scripts/repo_metrics/git_stats.py)
To see the full list of metrics, see [git_stats.py](git_stats.py)

The first step is to set up the credentials, copy the configuration file and fill up the credentials of GitHub and CosmosDB:

cp scripts/repo_metrics/config_template.py scripts/repo_metrics/config.py
cp tools/repo_metrics/config_template.py tools/repo_metrics/config.py

To track the current state of the repository and save it to CosmosDB:

python scripts/repo_metrics/track_metrics.py --github_repo "https://github.com/Microsoft/Recommenders" --save_to_database

To track an event related to this repository and save it to CosmosDB:

python scripts/repo_metrics/track_metrics.py --event "Today we did our first blog of the project" --event_date 2018-12-01 --save_to_database
python tools/repo_metrics/track_metrics.py --event "Today we did our first blog of the project" --event_date 2018-12-01 --save_to_database


### Setting up Azure CosmosDB

The API that we is used to track the GitHub metrics is the [Mongo API](https://docs.microsoft.com/en-us/azure/cosmos-db/mongodb-introduction).

The database name and collections name are defined in the [config file](config_template.py). There are two main collections, defined as `COLLECTION_GITHUB_STATS` and `COLLECTION_EVENTS` to store the information defined on the previous section.

**IMPORTANT NOTE**: If the database and the collections are created directly through the portal, a common partition key should be defined. We recommend to use `date` as partition key.


2 changes: 1 addition & 1 deletion tools/repo_metrics/config_template.py
Expand Up @@ -7,7 +7,7 @@

# CosmosDB Mongo API
CONNECTION_STRING = "mongodb://XXXXXXXXXXXXXXXXXXXXXXXXX.documents.azure.com:10255/?ssl=true&replicaSet=globaldb"
DATABASE = "reco_stats"
DATABASE = "nlp_stats"
COLLECTION_GITHUB_STATS = "github_stats"
COLLECTION_EVENTS = "events"

14 changes: 10 additions & 4 deletions tools/repo_metrics/track_metrics.py
Expand Up @@ -5,7 +5,6 @@
import os

# Need to append a full path instead of relative path.
# This seems to be an issue from Azure DevOps command line task.
# NOTE this does not affect running directly in the shell.
sys.path.append(os.getcwd())
import argparse
Expand All @@ -14,9 +13,8 @@
from datetime import datetime
from dateutil.parser import isoparse
from pymongo import MongoClient
from datetime import datetime
from scripts.repo_metrics.git_stats import Github
from scripts.repo_metrics.config import (
from tools.repo_metrics.git_stats import Github
from tools.repo_metrics.config import (
GITHUB_TOKEN,
CONNECTION_STRING,
DATABASE,
Expand All @@ -32,6 +30,7 @@

def parse_args():
"""Argument parser.

Returns:
obj: Parser.
"""
Expand Down Expand Up @@ -61,8 +60,10 @@ def parse_args():

def connect(uri="mongodb://localhost"):
"""Mongo connector.

Args:
uri (str): Connection string.

Returns:
obj: Mongo client.
"""
Expand All @@ -78,9 +79,11 @@ def connect(uri="mongodb://localhost"):

def event_as_dict(event, date):
"""Encodes an string event input as a dictionary with the date.

Args:
event (str): Details of a event.
date (datetime): Date of the event.

Returns:
dict: Dictionary with the event and the date.
"""
Expand All @@ -89,8 +92,10 @@ def event_as_dict(event, date):

def github_stats_as_dict(github):
"""Encodes Github statistics as a dictionary with the date.

Args:
obj: Github object.

Returns:
dict: Dictionary with Github details and the date.
"""
Expand Down Expand Up @@ -125,6 +130,7 @@ def github_stats_as_dict(github):

def tracker(args):
"""Main function to track metrics.

Args:
args (obj): Parsed arguments.
"""
Expand Down