In [1]:
from pinecone import Pinecone
from dotenv import load_dotenv
import sys
import os

# Add the backend directory to Python path so it can find the 'app' package
backend_dir = os.path.dirname(os.path.abspath(""))
if backend_dir not in sys.path:
    sys.path.insert(0, backend_dir)

from app.services.embeddings import EmbeddingService

load_dotenv()

text = """ ## Chapter 5. Tables
*-*-*-*
The bulk of the detailed information in a paper is typically presented in its tables. Do not overload the text with information that could be presented better in a table. As you prepare your article, consider whether a table is most appropriate.

- • If the text is crowded with detail, especially quantitative detail, consider creating a table.
- • Consolidate similar information into one table to let the reader compare easily so that the reader does not have to search for related information.
- • If a table has only a few rows and columns, try stating the findings in a few sentences. Information in small tables can often be presented better in the text.
*-*-*-*
Both tables and figures are used to support conclusions or illustrate concepts, but they have essential differences in purpose. Tables present numbers for comparison with other numbers or summarize or define concepts, terms, or other details of a study. Graphs reveal trends or delineate selected features. Sometimes the two purposes overlap, but they rarely substitute for one another. Data presented in tables should not be duplicated in graphs, and vice versa.
*-*-*-*
Readers often study tables and figures before they read the text. Therefore, each table and figure should stand alone, complete and informative in itself.
*-*-*-*
Tables are often used for reporting extensive numerical data in an organized manner. They should be self-explanatory. Number the tables in the order in which they are cited in the text.
*-*-*-*
GUIDELINES FOR PREPARING TABLES

Follow these guidelines to ensure that your tables will be prepared efficiently and accurately for typesetting, with little chance of introduced errors.

• Use Microsoft Word's table feature when creating a table. That is, the table that you create should have defined cells. DO NOT create tables by using the space bar and/ or tab keys.
• Do not use the enter key within the body of the table. Instead, separate data horizontally with a new row.
• Do not insert blank columns or rows.
• Asterisks or letters next to values indicating statistical significance should appear in the same cell as the value, not an adjacent cell (i.e., they should not have their own column).
• Spell out abbreviations at first mention in tables or add an abbreviations footnote, even if the abbreviation has already been defined in the text. The reader should be able to understand the table content without referring back to the text.
• To highlight individual values in tables, you may use boldface type, italic type, or underlining. Any highlighting must have a supplemental note of explanation; attach the note symbol to the first value that is so highlighted. Do not use color or shading.
*-*-*-*
## STRUCTURE OF A TABLE

The principal parts of a table are shown in Table 5–1. The remaining tables in this chapter show the basic structure as adapted for different types of information: a typical table (Table 5–2), a table with units varying row to row (Table 5–3), a table with both measured values and analysis of variance (Table 5–4), and a table without numeric data (Table 5–5).
*-*-*-*
Copyright © ASA–CSSA–SSSA, 5585 Guilford Rd., Madison, WI 53711, USA.  
_Publications Handbook and Style Manual._
*-*-*-*
5-1
*-*-*-*
The examples are drawn from published papers; commentary for this manual is added in italics.
*-*-*-*
Keep table titles brief but sufficiently detailed to explain the data included. Typically, specify the crop or soil involved, the major variables presented, and the place and year. Do not include units of measurement; these belong in a row of their own, just beneath the column headings, or in row headings.
*-*-*-*
Each column should have a heading describing the material below it. Give units in the first row below the headings. When the same units apply to adjacent columns, state the unit only once and use em dashes on each side of the unit to indicate how many columns are included. (See Tables 5–2 and 5–4 for examples.)
*-*-*-*
The column headings should reflect the type of data shown. That is, it is not enough to state “Yield of corn.” in the table title and then label columns only with 1994, 1995, and 1996, with a units row showing Mg ha⁻¹. Add a spanner heading, "Yield," above the year headings.
*-*-*-*
When the type of data varies row to row, put the units at the end of the stub entry describing the row. Separate the units from the row descriptor with a comma or parentheses. The column headings in this kind of table do not reflect the values shown but indicate some other grouping, such as time or place or experimental conditions.
*-*-*-*
TABLE NOTES

As shown in Table 5–1, four types of notes are used with tables: a general note that applies to the entire table, a note for abbreviations, notes that show statistical significance, and notes that give specific information. The asterisks *, **, and *** are always used in this order to show statistical significance at the 0.05, 0.01, and 0.001 probability levels, respectively, and cannot be used for other notes. Significance at other levels is designated by an alternate symbol (e.g., a dagger; see also Table 4–1). Lack of significance is usually indicated by "ns" and needs a note only if the lowest level of significance shown is higher than the nonsignificance level. Example:
*-*-*-*
TABLE 5-1 Table titles should be understandable to someone who has not read the text. The table below shows the main components of a typical table in ASA, CSSA, and SSSA publications.
<table><thead><tr><th>Column heading for stub ᵃ</th><th colspan="4">Spanner head ᵇ</th></tr><tr><th></th><th>Column heading</th><th colspan="3">Subspanner head ᶜ</th></tr><tr><th></th><th></th><th>Column heading</th><th>Column heading ᵈ</th><th>Column heading ᵉ</th></tr></thead><tbody><tr><td>unit ᶠ<br>(Stub)</td><td>unit<br>(Field)</td><td colspan="3">unit</td></tr><tr><td>Stub heading</td><td colspan="4">Independent line ᵍ</td></tr><tr><td>Row heading</td><td>value 1</td><td>value 2*</td><td>value 3***</td><td>value 4*</td></tr><tr><td>Row subheading ʰ</td><td>value 5</td><td>value 6**</td><td>value 7**</td><td>value 8*</td></tr><tr><td>Row heading</td><td>value 9</td><td>value 10*</td><td>value 11**</td><td>value 12*</td></tr><tr><td>Stub heading</td><td colspan="4">Independent line ⁱ</td></tr><tr><td>Row heading</td><td>value 13</td><td>value 14</td><td>value15**</td><td>value 16</td></tr></tbody></table>
Note: General note (applies to the table as a whole)
*-*-*-*
*Note:* General note (applies to the table as a whole).
Abbreviations: List of abbreviations used in the table.
*-*-*-*
a, b, c, d, e, f, g, h, i, etc. Specific notes (on one line or each starting on a new line if that improves readability).
*Significant at the 0.05 probability level. **Significant at the 0.01 probability level. ***Significant at the 0.001 probability level.
*-*-*-*
$5 - 2$
*-*-*-*
Copyright © ASA–CSSA–SSSA, 5585 Guilford Rd., Madison, WI 53711, USA.  
_Publications Handbook and Style Manual._
*-*-*-*
*Table 5–2 is an example of a typical table that shows the consistent relation of the uppermost spanner heading to the units and the data values. Adapted from Saseendran et al. (1998; Agronomy Journal 90, pp. 185–190).*
*-*-*-*
TABLE 5–2  Grain and straw yield in 1993 for ‘Jaya’ rice under rainfed conditions at Kerala Agricultural University in India, as measured and as calculated using CERES-Rice v3.0.
*-*-*-*
<table><thead><tr><th rowspan="2">Date</th><th colspan="2">Grain yield</th><th colspan="2">Straw yield</th></tr><tr><th>Measured</th><th>Calculated</th><th>Measured</th><th>Calculated</th></tr><tr><th colspan="5">kg ha⁻¹</th></tr></thead><tbody><tr><td>June 8</td><td>6100</td><td>5689</td><td>4600</td><td>7785</td></tr><tr><td>June 15</td><td>300</td><td>312</td><td>100</td><td>184</td></tr><tr><td>June 22</td><td>2300</td><td>2160</td><td>14,500</td><td>16,213</td></tr><tr><td>June 29</td><td>3200</td><td>3207</td><td>4200</td><td>6743</td></tr></tbody></table>
*-*-*-*
** Significant at the 0.01 probability level.
*** Significant at the 0.001 probability level.
† ns, nonsignificant at the 0.05 probability level.
For specific notes, use superscript letters. Cite the letters just as you would read a table—
from left to right and then from top to bottom, and reading across all spanner and subhead-
ings for one column before moving on to the next. Regardless of where the asterisks first
appear in a table, asterisked significance notes come after any specific notes keyed to the
letters.
*-*-*-*
Mean comparisons: When letters are used to display the significance of pair-wise mean comparisons in tables or figures, the meaning of letters should be concisely described in captions. Two examples of suitable verbiage: “Means not sharing a letter are
*-*-*-*
*Table 5–3 is an example of a table with units varying row to row (unlike the usual pattern seen in Table 5–2). Adapted from Bordovsky et al. (1998; Agronomy Journal 90, pp. 638–643).*
*-*-*-*
TABLE 5-3    Surface soil (0-15 cm) properties of Miles fine sandy loam soil at Munday, TX.
<table><thead><tr><th>Property</th><th>Value</th><th>Qualifier</th></tr></thead><tbody><tr><td>Physical</td><td></td><td></td></tr><tr><td>    Soil texture, g kg⁻¹</td><td></td><td></td></tr><tr><td>        Sand</td><td>800</td><td></td></tr><tr><td>        Silt</td><td>130</td><td></td></tr><tr><td>        Clay</td><td>70</td><td></td></tr><tr><td>    Slope, % ᵃ</td><td>1</td><td></td></tr><tr><td>    Erosion factor K</td><td>0.24</td><td>medium</td></tr><tr><td>    Mean permeability, m × 10⁻⁶ s⁻¹</td><td>28</td><td>moderately rapid</td></tr><tr><td>    Mean available water capacity, m³ m⁻³</td><td>0.12</td><td>very low</td></tr><tr><td>    Mean liquid limit†</td><td>22</td><td></td></tr><tr><td>    Mean plasticity index</td><td>5</td><td></td></tr><tr><td>Chemical</td><td></td><td></td></tr><tr><td>    Mean pH</td><td>7.8</td><td>mildly alkaline</td></tr><tr><td>    Organic matter, g kg⁻¹</td><td>3.3</td><td>low</td></tr><tr><td>    Available N, mg kg⁻¹</td><td>1</td><td>very low</td></tr><tr><td>    Available P, mg kg⁻¹</td><td>52</td><td>high high</td></tr><tr><td>    Available K, mg kg⁻¹</td><td>240</td><td>high</td></tr><tr><td>    Available Ca, mg kg⁻¹</td><td>1237</td><td>high</td></tr><tr><td>    Available Mg, mg kg⁻¹</td><td>500</td><td>high</td></tr><tr><td>    Available Na, mg kg⁻¹</td><td>111</td><td>low</td></tr><tr><td>    Available S, mg kg⁻¹</td><td></td><td>high</td></tr></tbody></table>
ᵃ Source: Soil Survey of Knox County, Texas (1979).
*-*-*-*
Copyright © ASA–CSSA–SSSA, 5585 Guilford Rd., Madison, WI 53711, USA.  
*Publications Handbook and Style Manual.*
*-*-*-*
$5 - 3$
*-*-*-*
Table 5–4 shows how to incorporate ANOVA results. The centered independent heading is used, together with the new main entry line in the stub, to alert the reader to a change in the type of data for the rows that follow. Adapted from Porter et al. (1996; Agronomy Journal 88, pp. 750–757).
*-*-*-*
TABLE 5-4   Wheat N uptake (1988) as affected by fertilizer N and indigenous soil N.
<table><thead><tr><th>Fertilizer N rate</th><th>df</th><th>Fertilizer N uptake</th><th>df</th><th>Soil N uptake</th></tr><tr><th>kg ha⁻¹</th><th></th><th>kg ha⁻¹</th><th></th><th>kg ha⁻¹</th></tr></thead><tbody><tr><td>0</td><td></td><td>–</td><td></td><td>85a</td></tr><tr><td>56</td><td></td><td>28a</td><td></td><td>67ab</td></tr><tr><td>112</td><td></td><td>47b</td><td></td><td>63b</td></tr><tr><td colspan="5">ANOVA</td></tr><tr><td>Source of variation</td><td></td><td></td><td></td><td></td></tr><tr><td>N rate (N)</td><td>1</td><td>***</td><td>2</td><td>*</td></tr><tr><td>Microplot (M)</td><td>3</td><td>NS†</td><td>3</td><td>NS</td></tr><tr><td>N × M</td><td>3</td><td>NS</td><td>6</td><td>NS</td></tr><tr><td>CV, %</td><td></td><td>22</td><td></td><td>16</td></tr></tbody></table>
Note: Means not sharing a letter are significantly different at the 5% level of significance according to a t-test.
*Significant at the .05 probability level. ***Significant at the .001 probability level. †NS, nonsignificant.
*-*-*-*
*As shown in Table 5–5, sometimes a table is the best way to organize words. Adapted from Einhellig (1996; Agronomy Journal 88, pp. 886–893).*
*-*-*-*
TABLE 5-5 Studies reporting stress enhancement of the action of allelopathic chemicals.
<table><thead><tr><th>Stress</th><th>Bioassay</th><th>Species</th><th>Allelochemical</th><th>Reference</th></tr></thead><tbody><tr><td>High temperature</td><td>SG</td><td>soybean; grain sorghum</td><td>ferulic acid</td><td>Einhellig and Eckrich (1984)</td></tr><tr><td>High temperature</td><td>plantlets</td><td>barley</td><td>gramine</td><td>Hanson et al. (1983)</td></tr><tr><td>Low nutrients</td><td>RE</td><td>barley</td><td>phenolic acids</td><td>Glass (1976)</td></tr><tr><td>Low N or P</td><td>RE</td><td>barley</td><td>p-coumaric acid; vanillic acid</td><td>Stowe and Osborn (1980)</td></tr><tr><td>Low N or K</td><td>SG</td><td>Schizachyrium scoparium</td><td>hydrocinnamic acid</td><td>Williamson et al. (1992)</td></tr><tr><td>Moisture stress</td><td>G, SG</td><td>grain sorghum</td><td>ferulic acid</td><td>Einhellig (1987, 1989)</td></tr></tbody></table>
Abbreviations: G, germination; RE, root elongation; SG, seedling growth.
*-*-*-*
significantly different at the 5% level of significance according to a t-test” or “Means with a letter in common are not significantly different at the 5% level according to Tukey’s HSD test.” Also see Piepho (2018), "Letters in mean comparisons: What they do and don't mean," *Agronomy Journal* 110, 431–434.
*-*-*-*
If individual values in a table are highlighted using italic or bold type or underlining, attach the note symbol to the first value that is so highlighted. If standard errors or standard deviations are included, either in parentheses or with ±, attach the note symbol to the first value that includes this addition.
*-*-*-*
5-4
*-*-*-*
Copyright © ASA–CSSA–SSSA, 5585 Guilford Rd., Madison, WI 53711, USA.  
_Publications Handbook and Style Manual._
*-*-*-* """

chunked_text = []
for chunk in text.split("*-*-*-*"):
    chunked_text.append(chunk)

for chunk in chunked_text:
    print(chunk)
    print("****" * 10)


 ## Chapter 5. Tables

****************************************

The bulk of the detailed information in a paper is typically presented in its tables. Do not overload the text with information that could be presented better in a table. As you prepare your article, consider whether a table is most appropriate.

- • If the text is crowded with detail, especially quantitative detail, consider creating a table.
- • Consolidate similar information into one table to let the reader compare easily so that the reader does not have to search for related information.
- • If a table has only a few rows and columns, try stating the findings in a few sentences. Information in small tables can often be presented better in the text.

****************************************

Both tables and figures are used to support conclusions or illustrate concepts, but they have essential differences in purpose. Tables present numbers for comparison with other numbers or summarize or define concepts, terms, or ot

In [2]:
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index("labverse")


# Get embedding for the text
embedding_service = EmbeddingService()
print("initialized embedding service")
embeddings = embedding_service.embed_batch(chunked_text)
print("done embedding")
embedding_service.store_embeddings(embeddings)

initialized embedding service
embedding batch
embedded text  ## Chapter 5. Tables

embedded text 
The bulk of the detailed information in a paper is typically presented in its tables. Do not overload the text with information that could be presented better in a table. As you prepare your article, consider whether a table is most appropriate.

- • If the text is crowded with detail, especially quantitative detail, consider creating a table.
- • Consolidate similar information into one table to let the reader compare easily so that the reader does not have to search for related information.
- • If a table has only a few rows and columns, try stating the findings in a few sentences. Information in small tables can often be presented better in the text.

embedded text 
Both tables and figures are used to support conclusions or illustrate concepts, but they have essential differences in purpose. Tables present numbers for comparison with other numbers or summarize or define concepts, terms,

In [None]:
query = "What is the best way to present data in a table?"
query_embedding = embedding_service.embed_text(query)[0]
print("query embedding: ", query_embedding)
print("query text: ", embedding_service.embed_text(query)[1])
matches = embedding_service.search_similar(query_embedding, top_k=1)

query embedding [-0.015868909657001495, -0.004938856698572636, 0.07450192421674728, 0.010840029455721378, 0.032582174986600876, -0.018898654729127884, -0.012025851756334305, -0.00923823844641447, -0.03593476116657257, 0.029924938455224037, 0.006450624670833349, -0.0021621077321469784, 0.0030390575993806124, -0.005361034069210291, 0.024411795660853386, 0.0026618915144354105, 0.03283051401376724, -0.010914531536400318, -0.003784076776355505, 0.05989954620599747, -0.008853311650454998, 0.03777247294783592, 0.02930408902466297, 0.03109213523566723, 0.034593723714351654, 0.06193593144416809, -0.01716027595102787, 0.019271163269877434, 0.02273550257086754, 0.020736368373036385, -0.022300908342003822, -0.027814051136374474, 0.018066715449094772, -0.001651459257118404, 0.018873820081353188, -0.019569171592593193, 0.00916373636573553, 0.01339792925864458, -0.010995241813361645, -0.0072142696008086205, 0.04641469568014145, 0.045843515545129776, 0.02592666819691658, 0.04410513862967491, 0.0742535

In [10]:
print(len(matches))
print(matches[0]['metadata']['text'])

1

Tables are often used for reporting extensive numerical data in an organized manner. They should be self-explanatory. Number the tables in the order in which they are cited in the text.

