-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract endnotes as well #19
Conversation
Generally this just works, but in your https://github.com/rmzelle/ref-extractor/files/1595117/Dok9-endnotes2.docx document one of the endnotes is split over multiple <w:r>
<w:instrText xml:space="preserve">
ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"oI6eP5QH","properties":{"formattedCitation":"{\\rtf Paras Mandal u.\\uc0\\u160{}a., \\uc0\\u8222{}A Novel Approach to Forecast Electricity Price for PJM Using Neural Network and Similar Days
Method\\uc0\\u8220{}, {\\i{}IEEE Transactions on Power Systems} 22, Nr. 4 (November 2007): 5, https://doi.org/10.1109/TPWRS.2007.907386; Safder Alladina, \\uc0\\u8222{}Second Language Teaching through Maths: Learning Maths through a Second
Language\\uc0\\u8220{}, {\\i{}Educational Studies in Mathematics} 16, Nr. 2 (1. Mai 1985): 215\\uc0\\u8211{}19.}","plainCitation":"Paras Mandal u. a., „A Novel Approach to Forecast Electricity Price for PJM Using Neural Network and Similar Days
Method“, IEEE Transactions on Power Systems 22, Nr. 4 (November 2007): 5, https://doi.org/10.1109/TPWRS.2007.907386; Safder Alladina, „Second Language Teaching through Maths: Learning Maths through a Second Language“, Educational Studies in
Mathematics 16, Nr. 2 (1. Mai 1985): 215–19."},"citationItems":[{"id":11835,"uris":["http://zotero.org/users/96641/items/ANPFGCCI"],"uri":["http://zotero.org/users/96641/items/ANPFGCCI"],"itemData":{"id":11835,"type":"article-journal","title":"A
Novel Approach to Forecast Electricity Price for PJM Using Neural Network and Similar Days Method","container-title":"IEEE Transactions on Power Systems","page":"2058-2065","volume":"22","issue":"4","source":"EBSCOhost","abstract":"Price forecasting
in competitive electricity markets is critical for consumers and producers in planning their operations and managing their price risk, and it also plays a key role in the economic optimization of the electric energy industry. This paper explores a
technique of artificial neural network (ANN) model based on similar days (SD) method in order to forecast day-ahead electricity price in the PJM market. To demonstrate the superiority of the proposed model, publicly available data acquired from the
PJM Interconnection were used for training and testing the ANN. The factors impacting the electricity price forecasting, including time factors, load factors, and historical price factors, are discussed. Comparison of forecasting performance of the
proposed ANN model with that of forecasts obtained from similar days method is presented. Daily and weekly mean absolu</w:instrText>
</w:r>
<w:r w:rsidRPr="00993842">
<w:rPr><w:lang w:val="en-US"/></w:rPr>
<w:instrText xml:space="preserve">te percentage error (MAPE) of reasonably small value and forecast mean square error (FMSE) of less than 7$/MWh were obtained for the PJM data, which has correlation coefficient of determination (R²) of 0.6744 between
load and electricity price. Simulation results show that the proposed ANN model based on similar days method is capable of forecasting locational marginal price (LMP) in the PJM market efficiently and
accurately.","DOI":"10.1109/TPWRS.2007.907386","ISSN":"08858950","journalAbbreviation":"IEEE Transactions on Power
Systems","author":[{"family":"Mandal","given":"Paras"},{"family":"Senjyu","given":"Tomonobu"},{"family":"Urasaki","given":"Naomitsu"},{"family":"Funabashi","given":"Toshihisa"},{"family":"Srivastava","given":"Anurag
K."}],"issued":{"date-parts":[["2007",11]]}},"locator":"5"},{"id":1074,"uris":["http://zotero.org/users/96641/items/5DNR6EWT"],"uri":["http://zotero.org/users/96641/items/5DNR6EWT"],"itemData":{"id":1074,"type":"article-journal","title":"Second
Language Teaching through Maths: Learning Maths through a Second Language","container-title":"Educational Studies in Mathematics","page":"215-219","volume":"16","issue":"2","source":"JSTOR","ISSN":"0013-1954","shortTitle":"Second Language Teaching
through Maths","journalAbbreviation":"Educational Studies in
Mathematics","author":[{"family":"Alladina","given":"Safder"}],"issued":{"date-parts":[["1985",5,1]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}
</w:instrText>
</w:r> (in contrast, the other endnotes in the document are stored in a single <w:instrText xml:space="preserve">
ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"kxO9MnbO","properties":{"formattedCitation":"{\\rtf vgl. {\\i{}Returns to Education in West Germany over Time\\uc0\\u8239{}: Educational Expansion, Occupational Upgrading and the Job Matching Process},
2013.}","plainCitation":"vgl. Returns to Education in West Germany over Time : Educational Expansion, Occupational Upgrading and the Job Matching Process,
2013."},"citationItems":[{"id":795,"uris":["http://zotero.org/users/96641/items/2RDF5IIE"],"uri":["http://zotero.org/users/96641/items/2RDF5IIE"],"itemData":{"id":795,"type":"book","title":"Returns to education in West Germany over time : educational
expansion, occupational upgrading and the job matching process","number-of-pages":"379","source":"Primo","abstract":"Mannheim, Univ., Diss., 2013","shortTitle":"Returns to education in West Germany over
time","language":"en","author":[{"family":"Klein","given":"Markus"}],"issued":{"date-parts":[["2013"]]}},"suppress-author":true,"prefix":"vgl."}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}
</w:instrText> ) |
Another test shows that ...
<w:r>
<w:instrText xml:space="preserve">... 100 M</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" w:cs="Cambria Math"/>
</w:rPr>
<w:instrText>⊙</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:instrText>M</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" w:cs="Cambria Math"/>
</w:rPr>
<w:instrText>⊙</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:instrText>M</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" w:cs="Cambria Math"/>
</w:rPr>
<w:instrText>⊙</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:instrText>M</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" w:cs="Cambria Math"/>
</w:rPr>
<w:instrText>⊙</w:instrText>
</w:r>
<w:r w:rsidR="00EE6AF0">
<w:instrText xml:space="preserve">...</w:instrText>
</w:r> However, it seems that you can concatenate all |
Sure. I'm just wondering if this also happens for footnote or in-text citations. For endnotes the logic to glue the pieces back together looks reasonably clear, since each endnote is wrapped in its own Anyway, I'll just merge this for some endnote support, even if it doesn't work perfect yet. |
I tried out both cases. Yes, the same seems to happen for footnotes and in-text citations: Dok9-footnotes2.docx Do you need more examples? |
I think this is enough, although I don't know enough about the .docx format to know how to reliably extract these split citations for author-date styles. In your example, it looks like: <w:p w:rsidR="005002F6" w:rsidRDefault="00BD3B66">
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r>
<w:rPr><w:lang w:val="en-US"/></w:rPr>
<w:instrText xml:space="preserve">
ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"oI6eP5QH","properties":{"formattedCitation":"{\\rtf (Mandal u.\\uc0\\u160{}a. 2007, 5; Alladina 1985)}","plainCitation":"(Mandal u. a. 2007, 5; Alladina
1985)"},"citationItems":[{"id":11835,"uris":["http://zotero.org/users/96641/items/ANPFGCCI"],"uri":["http://zotero.org/users/96641/items/ANPFGCCI"],"itemData":{"id":11835,"type":"article-journal","title":"A Novel Approach to Forecast Electricity
Price for PJM Using Neural Network and Similar Days Method","container-title":"IEEE Transactions on Power Systems","page":"2058-2065","volume":"22","issue":"4","source":"EBSCOhost","abstract":"Price forecasting in competitive electricity markets is
critical for consumers and producers in planning their operations and managing their price risk, and it also plays a key role in the economic optimization of the electric energy industry. This paper explores a technique of artificial neural network
(ANN) model based on similar days (SD) method in order to forecast day-ahead electricity price in the PJM market. To demonstrate the superiority of the proposed model, publicly available data acquired from the PJM Interconnection were used for
training and testing the ANN. The factors impacting the electricity price forecasting, including time factors, load factors, and historical price factors, are discussed. Comparison of forecasting performance of the proposed ANN model with that of
forecasts obtained from similar days method is presented. Daily and weekly mean absolu</w:instrText>
</w:r>
<w:r w:rsidRPr="00BD3B66">
<w:instrText xml:space="preserve">te percentage error (MAPE) of reasonably small value and forecast mean square error (FMSE) of less than 7$/MWh were obtained for the PJM data, which has correlation coefficient of determination (R²) of 0.6744 between
load and electricity price. Simulation results show that the proposed ANN model based on similar days method is capable of forecasting locational marginal price (LMP) in the PJM market efficiently and
accurately.","DOI":"10.1109/TPWRS.2007.907386","ISSN":"08858950","journalAbbreviation":"IEEE Transactions on Power
Systems","author":[{"family":"Mandal","given":"Paras"},{"family":"Senjyu","given":"Tomonobu"},{"family":"Urasaki","given":"Naomitsu"},{"family":"Funabashi","given":"Toshihisa"},{"family":"Srivastava","given":"Anurag
K."}],"issued":{"date-parts":[["2007",11]]}},"locator":"5"},{"id":1074,"uris":["http://zotero.org/users/96641/items/5DNR6EWT"],"uri":["http://zotero.org/users/96641/items/5DNR6EWT"],"itemData":{"id":1074,"type":"article-journal","title":"Second
Language Teaching through Maths: Learning Maths through a Second Language","container-title":"Educational Studies in Mathematics","page":"215-219","volume":"16","issue":"2","source":"JSTOR","ISSN":"0013-1954","shortTitle":"Second Language Teaching
through Maths","journalAbbreviation":"Educational Studies in
Mathematics","author":[{"family":"Alladina","given":"Safder"}],"issued":{"date-parts":[["1985",5,1]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}
</w:instrText>
</w:r>
...
</w:p> My main question is whether the structure here is always: <w:p>
<w:r>
<w:instrText>...</w:instrText>
</w:r>
<w:r>
<w:instrText>...</w:instrText>
</w:r>
...
</w:p> or whether the outer element can ever be something different than w:p. |
(and I created a dedicated ticket for this, per above) |
@zuphilip, by the way, would it be okay if I add the Word documents you shared to the repo itself? It would be good to have some test data available. I checked one with https://www.get-metadata.com/ and it doesn't look like it didn't contain any sensitive from a privacy standpoint. |
@rmzelle Yes, that should be no problem. As long as you don't look at the referenced data too closely 😄 (it is a random subset of my Zotero library with varying metadata quality) |
Follow-up of #17, to extract Zotero citations from endnotes as well. (cc @zuphilip)