-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]CodeHierarchyNodeParser: The child text is not contained inside the parent text. #12924
Comments
The error "The child text is not contained inside the parent text" you're encountering with To address this, ensure the text splitting logic accurately partitions the parent text into child nodes. Verify that the start and end indices of child nodes are correctly calculated, and check that metadata, if any, does not disrupt the matching of child text within the parent text. Adding detailed logging around the creation of child nodes and their matching process can help pinpoint where the mismatch occurs. If you're not on the latest version of Sources
|
The bug is produced because the file contains 2 function definitions that are exactly the same. if isinstance(datetime_or_doy, pd.DatetimeIndex):
to_doy = tools._pandas_to_doy # won't be evaluated unless necessary
def to_datetimeindex(x): return x # noqa: E306
to_output = partial(pd.Series, index=datetime_or_doy)
elif isinstance(datetime_or_doy, pd.Timestamp):
to_doy = tools._pandas_to_doy
to_datetimeindex = \
tools._datetimelike_scalar_to_datetimeindex
to_output = tools._scalar_out
elif isinstance(datetime_or_doy,
(datetime.date, datetime.datetime, np.datetime64)):
to_doy = tools._datetimelike_scalar_to_doy
to_datetimeindex = \
tools._datetimelike_scalar_to_datetimeindex
to_output = tools._scalar_out
elif np.isscalar(datetime_or_doy): # ints and floats of various types
def to_doy(x): return x # noqa: E306
to_datetimeindex = partial(tools._doy_to_datetimeindex,
epoch_year=epoch_year)
to_output = tools._scalar_out
else: # assume that we have an array-like object of doy
def to_doy(x): return x # noqa: E306
to_datetimeindex = partial(tools._doy_to_datetimeindex,
epoch_year=epoch_year)
to_output = tools._array_out The function def to_doy(x): return x # noqa: E306 is defined 2 times. When using skeletonized as True, the function _skeletonize tries to replace the function with the replacement_text. The first child node enters the function with text = "def to_doy(x): return x # noqa: E306". Then in line 832 of llama-index-packs/llama-index-packs-code-hierarchy/llama_index/packs/code_hierarchy/code_hierarchy.py parent_node.text = parent_node.text.replace(child_node.text, replacement_text) Both functions def to_doy(x): return x # noqa: E306 are replaced with the replacement text. The second child node enters with exactly the same text def to_doy(x): return x # noqa: E306 but it has already been replaced by the first child node, causing the error to be raised. if child_node.text not in parent_node.text:
raise ValueError("The child text is not contained inside the parent text.") Proposed SolutionWhen replacing the text only replace the 1st appearance of it index = parent_node.text.find(child_node.text)
# If the text is found, replace only the first occurrence
if index != -1:
parent_node.text = parent_node.text[:index] + replacement_text + parent_node.text[index + len(child_node.text):] I will open a PR with the proposed solution |
Bug Description
When parsing pvlib repo the irradiance.py file with CodeHierarchyNodeParser it threw "The child text is not contained inside the parent text." Error
Version
llama-index-packs-code-hierarchy: 0.1.3
Steps to Reproduce
Script:
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: