Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed regex bug in RetrievalQAWithSources in previous update #9898

Merged
merged 8 commits into from Aug 30, 2023

Conversation

nik1097
Copy link
Contributor

@nik1097 nik1097 commented Aug 29, 2023

  • Description: In my previous PR, I had modified the code to catch all kinds of [SOURCES, sources, Source, Sources]. However, this change included checking for a colon or a white space which should actually have been only checking for a colon.
  • Issue: the issue # it fixes (if applicable),
  • Dependencies: any dependencies required for this change,
  • Tag maintainer: @baskaryan
  • Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before submitting. Run make format, make lint and make test to check this locally.

See contribution guidelines for more information on how to write/run tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:

  1. a test for the integration, preferably unit tests that do not rely on network access,
  2. an example notebook showing its use. These live is docs/extras directory.

@vercel
Copy link

vercel bot commented Aug 29, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 29, 2023 11:38pm

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Aug 29, 2023
@nik1097 nik1097 changed the title Fixed regex bug that had crept in with my previous PR Fixed regex bug in RetrievalQAWithSources in previous update Aug 29, 2023
@nik1097
Copy link
Contributor Author

nik1097 commented Aug 29, 2023

Hey @baskaryan we have a demo coming up and I was hoping we could merge and deploy a new version of this as soon as possible. Thank you for your consideration!

@@ -120,9 +120,9 @@ def validate_naming(cls, values: Dict) -> Dict:

def _split_sources(self, answer: str) -> Tuple[str, str]:
"""Split sources from answer."""
if re.search(r"SOURCES?[:\s]", answer, re.IGNORECASE):
if re.search(r"SOURCES?[:]\s", answer, re.IGNORECASE):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the square brackets still needed if there's only one element?

also what's the failure mode here? is it that "sources: " would be split before the ending space? does that matter given we strip the source?

Copy link
Contributor Author

@nik1097 nik1097 Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @baskaryan thanks for the response. You're right I don't need the square braces.
We were looking for "colon or white space" previously.
The failure was the fact that if we had the word "source" or "sources" in our answer it would cut the answer off.
Example:
The source of truth for the test subject. SOURCES: 28-pl.
Current Answer:
The
Expected Answer:
The source of truth for the test subject.
SOURCES: 28pl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also do away with the whitespace regex, since we perform a strip. Let me update my PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fixed it!

@@ -27,6 +27,12 @@
"This Agreement is governed by English law.\n",
"28-pl",
),
(
"According to the sources, the agreement is governed by English law.\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a comma after sources this would've worked before right? should we drop the comma

Copy link
Contributor Author

@nik1097 nik1097 Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I had updated it, my bad! Thanks for pointing it out, I've updated it now.

@baskaryan
Copy link
Collaborator

thank you @nik1097!!

@baskaryan baskaryan merged commit ec362ec into langchain-ai:master Aug 30, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants