Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Commit

Permalink
Validation (#5)
Browse files Browse the repository at this point in the history
* Update to 2024.0 (#4)

* Update to 2024.0

* Changes for reviews round 1

* empty line in env yml file

* Changes for reviews round 2

* fix environment
  • Loading branch information
eromomon committed Feb 2, 2024
1 parent b940a94 commit 4b43109
Show file tree
Hide file tree
Showing 23 changed files with 715 additions and 483 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2023, Intel Corporation
Copyright (c) 2024, Intel Corporation

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Expand Down
1,019 changes: 678 additions & 341 deletions README.md

Large diffs are not rendered by default.

Binary file removed assets/e2e-embedding-original.png
Binary file not shown.
Binary file removed assets/e2e-embedding-reranking.png
Binary file not shown.
Binary file added assets/embedding_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/offline-embedding-relative-perf.png
Binary file not shown.
Binary file added assets/real_time_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/realtime-search-relative-perf.png
Binary file not shown.
Binary file added assets/reranker.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 0 additions & 19 deletions data/README.md

This file was deleted.

2 changes: 1 addition & 1 deletion data/download_data.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2022 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down
21 changes: 0 additions & 21 deletions env/intel/intel.yml

This file was deleted.

18 changes: 18 additions & 0 deletions env/intel_env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: vertical_search_intel
channels:
- intel
- conda-forge
dependencies:
- python=3.9
- intel-extension-for-pytorch==2.0.100
- pandas=1.5.3
- cpuonly=1.0
- neural-compressor=2.3.1
- opencv=4.8.1
- scipy=1.10.1
- datasets=2.16.1
- gperftools=2.10
- psutil=5.8.0
- transformers=4.37.1
- sentence-transformers=2.3.1

16 changes: 0 additions & 16 deletions env/stock/stock.yml

This file was deleted.

55 changes: 0 additions & 55 deletions setupenv.sh

This file was deleted.

2 changes: 1 addition & 1 deletion src/configs/vse_config_base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ model:
inference:
top_k : 5
score_function : dot # cos_sim, dot
corpus_embeddings_path : ../saved_output/embeddings.pkl
corpus_embeddings_path : output/embeddings.pkl
4 changes: 2 additions & 2 deletions src/configs/vse_config_inc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ model:
max_seq_length: 128

# inc required config parameters
path: ../saved_models/inc_int8
path: output/models/inc_int8

# inference config
inference:
top_k : 5
score_function : dot # cos_sim, dot
corpus_embeddings_path : ../saved_output/embeddings.pkl
corpus_embeddings_path : output/embeddings.pkl
2 changes: 1 addition & 1 deletion src/display_rankings.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down
15 changes: 4 additions & 11 deletions src/run_document_embedder.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down Expand Up @@ -195,9 +195,9 @@ def main(flags):
max_sequence_length = conf['model']['max_seq_length']

# use IPEX to optimize model
if flags.intel:
import intel_extension_for_pytorch as ipex
embedder = ipex.optimize(embedder, dtype=torch.float32)

import intel_extension_for_pytorch as ipex
embedder = ipex.optimize(embedder, dtype=torch.float32)

sample_inputs = tokenizer.batch_decode([
random.sample(
Expand Down Expand Up @@ -283,13 +283,6 @@ def main(flags):
default=100
)

parser.add_argument('--intel',
required=False,
help="use intel pytorch extension to optimize model",
action="store_true",
default=False
)

FLAGS = parser.parse_args()

main(FLAGS)
2 changes: 1 addition & 1 deletion src/run_quantize_inc.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down
17 changes: 6 additions & 11 deletions src/run_query_search.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down Expand Up @@ -233,9 +233,11 @@ def main(flags):
embedder.eval()
max_sequence_length = conf['model']['max_seq_length']

if flags.intel:
import intel_extension_for_pytorch as ipex
embedder = ipex.optimize(embedder, dtype=torch.float32)
# use IPEX to optimize model

import intel_extension_for_pytorch as ipex
embedder = ipex.optimize(embedder, dtype=torch.float32)

sample_inputs = tokenizer.batch_decode([
random.sample(
range(tokenizer.vocab_size), max_sequence_length) for
Expand Down Expand Up @@ -348,13 +350,6 @@ def main(flags):
default=100
)

parser.add_argument('--intel',
required=False,
help="use intel pytorch extension to optimize model",
action="store_true",
default=False
)

FLAGS = parser.parse_args()

main(FLAGS)
2 changes: 1 addition & 1 deletion src/utils/dataloader.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down
2 changes: 1 addition & 1 deletion src/utils/embed.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# !/usr/bin/env python3
# -*- coding: utf-8 -*-

# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause

# pylint: disable=C0415,E0401,R0914
Expand Down

0 comments on commit 4b43109

Please sign in to comment.