Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVG HELP! #28

Closed
nlottig94 opened this issue Dec 15, 2015 · 24 comments
Closed

SVG HELP! #28

nlottig94 opened this issue Dec 15, 2015 · 24 comments

Comments

@nlottig94
Copy link
Owner

@ebeshero @ghbondar So I'm helping @brookestewart with the SVG. I am trying to find the total dashes (—) in each original Dickinson poem. So this is the XPath that I came up with to search for them: count(//rdg[@wit[contains(., '#df16') or contains(., 'var0') or contains(., 'var1') or contains(., 'var2')]][contains(., '—')])
Sorry it's super long! It seemed to me that it would work correctly. I counted the dashes myself in poem 1, and it should be giving me 21; however, it is only giving me 17...any suggestions???

@ebeshero
Copy link
Collaborator

@nlottig94 Before I check the poem itself, tell me if there is a chance a dash character could be sitting outside an <rdg> element?

@nlottig94
Copy link
Owner Author

I don't believe there are.

@ebeshero
Copy link
Collaborator

Just took a look at poem 1, and you're right, they're all inside rdg elements. Actually I think you're missing some dashes because you're not actually counting the dashes themselves, but instead the parent elements that contain them! A few of these elements probably contain multiple dashes.

So you want to be counting the dash characters directly, in any elements' text node. Let's see if we can figure out a way...

@nlottig94
Copy link
Owner Author

Okay, I'll look at it tomorrow. I'm not at my computer right now.

@alexthattalks
Copy link
Collaborator

Opps, didn't mean to do that! I was typing a similar thought to @ebeshero and hit the wrong button lol

@ebeshero
Copy link
Collaborator

I'm checking Michael Kay, and I think what we want is analyze-string--but let me check...

@ebeshero
Copy link
Collaborator

Read about <xsl:analyze-string> in Michael Kay, pages 230+...and some examples around 907-908...

I think I know how to write this...

@ebeshero
Copy link
Collaborator

@nlottig94 @brookestewart @amielnicki I tried a few strategies last night and this morning, and it's not quite right! But first of all, I made some corrections to your xsl:stylesheet so it would read TEI, and I found a reliable strategy for plotting the x position of your dots. What's not working is counting the hyphens...yet. I'm confident we'll figure it out, but I pushed what I have so far:

See: https://github.com/ebeshero/EmilyDickinson16/blob/master/xslt_to_SVG.xsl

You'll see that I've experimented with a couple of variables, and I need to explain to you how this works if I can get it to work. There may well be a simpler way to do this, so I'm going to ask for a little help myself.
@djbpitt

@ebeshero
Copy link
Collaborator

@djbpitt Hi David! I wonder if you can help us here: We're looking for a good strategy to do the following on the Dickinson project:

  • The Dickinson team is working on preparing SVG graphs to compare how much the different editions reduced Emily Dickinson's hyphens. They're running XSLT over a collection of files (using collection()) in the directory in this GitHub repository called "Dickinson", and for right now, they're just working on counting the hyphens wherever they appear in each poem.
  • Each of the 11 TEI files in the Dickinson directory or collection contains a poem inside its <body> element, and inside those poems (nearly always inside an <rdg> element) are hyphens which we've indicated with this special character: —
    The team did not mark those hyphens with anything (didn't use the <punct> element for example).
  • I thought we could grab and count the hyphens with xsl:analyze-string() , and I'm running into some interesting snags possibly to do with our use of collection(), though I'm not sure. The strategy I tried is basically this:
  1. Define a variable named $hyphenX in XSLT, and make its value is dependent on applying-templates on the <rdg> elements in each TEI file. (Really I think we want to make this run over any element whose text-node contains the hyphen character, and that's how I first wrote this, but I retrenched just simply to reach into the <rdg> elements when I wasn't generating good output.) Note my positioning of this, which could be problematic: I've set this inside an xsl:for-each that is looping through each file in the collection and generating me some variables and then drawing our circles for dots on the SVG graph.

  2. In the template rule matching on rdg[contains(., '&#8212;')], I am attempting to use <xsl:analyze-string> to reach in and internally match on a hyphen character, and then to output a character, an arbitrary letter X. I'm doing that so I can construct a string of X's: so that every time the parser finds a hyphen, it outputs an X, and starts building a little string of these. That string of X's should be the same string-length as the number of hyphens in the document (at least if I've understood this right).

  3. Then back up in the xsl:for-each looping through my poems, I create another variable called $hyphenCount that calculates the string-length() of the string of X's I think I should have generated.

The problem is: Apparently I am generating lots of X's, and it looks like a different number for each poem, but I'm getting much too many: in the 400s through 600s, when the numbers should be more like the 20s to 30s. Can anyone help figure out what's going wrong here? I bet it's something to do with overcalculating or getting repeated counts somehow--some way in which I've framed how our looping through the poems and/or the rdg elements works.

It may be my strategy is too complex, so maybe let's see if we can figure out a simpler way to do this!
(In working out this strategy, I'm patching together some advice ( See among others: http://stackoverflow.com/questions/6679705/count-the-number-of-occurrences-of-a-string-in-xml-using-xslt ) that used to do things like this with <xsl:call-template> in XSLT 1.0...and from reading Michael Kay (p. 271), I don't think we need to use that any more and <xsl:analyze-string> ought to be able to do much the same thing. So this is my attempt to do much the same thing...)

@djbpitt
Copy link

djbpitt commented Dec 15, 2015

Dear Elisa and everyone else,

It looks as if you're defining $hyphenX as equal to a element and
then $hyphenCount as the string-length() of that entire . I think what
you want in order count the hyphens in the is to set $hyphenCount to
something like:

string-length($hyphenX) - string-length(translate($hyphenX,'-',''))

This strips the hyphens from $hyphenX (on the right) and subtracts its
length without hyphens from its length with hyphens, and the difference
should be the count of hyphens.

I just peeked at that bit of XSLT code, so I haven't explored the rest or
tried to run the transformation, but I hope this is nonetheless helpful.

Best,

David

On Tue, Dec 15, 2015 at 10:56 AM, Elisa Beshero-Bondar <
notifications@github.com> wrote:

@djbpitt https://github.com/djbpitt Hi David! I wonder if you can help
us here: We're looking for a good strategy to do the following on the
Dickinson project:

  • The Dickinson team is working on preparing SVG graphs to compare how
    much the different editions reduced Emily Dickinson's hyphens. They're
    running XSLT over a collection of files (using collection()) in the
    directory in this GitHub repository called "Dickinson", and for right now,
    they're just working on counting the hyphens wherever they appear in each
    poem.
  • Each of the 11 TEI files in the Dickinson directory or collection
    contains a poem inside its element, and inside those poems
    (nearly always inside an element) are hyphens which we've
    indicated with this special character: —
    The team did not mark those hyphens with anything (didn't use the
    element for example).
  • I thought we could grab and count the hyphens with
    xsl:analyze-string() , and I'm running into some interesting snags possibly
    to do with our use of collection(), though I'm not sure. The strategy I
    tried is basically this:
  1. Define a variable named $hyphenX in XSLT, and make its value is
    dependent on applying-templates on the ' elements in each TEI file.
    (Really I think we want to make this run over any element whose text-node
    contains the hyphen character
    , and that's how I first wrote this, but I
    retrenched just simply to reach into theelements when I wasn't generating
    good output.) Note my positioning of this, which could be problematic:
    I've set this inside anxsl:for-each` that is looping through each file in
    the collection and generating me some variables and then drawing our
    circles for dots on the SVG graph.

  2. In the template rule matching on rdg[contains(., '—')], I am
    attempting to use xsl:analyze-string to reach in and internally match
    on a hyphen character, and then to output a character, an arbitrary letter
    X. I'm doing that so I can pass that as a growing string, so that every
    time the parser finds a hyphen, it outputs an X, and starts building a
    little string of these. That string of X's should be the same string-length
    as the number of hyphens in the document (at least if I've understood this
    right).

  3. Then back up in the xsl:for-each looping through my poems, I create
    another variable that calculates the string-length() of the string of X's I
    think I should have generated.

The problem is: Apparently I am generating lots of X's, and it looks
like a different number for each poem, but I'm getting much too many: in
the 400s through 600s, when the numbers should be more like the 20s to 30s.
Can anyone help figure out what's going wrong here? I bet it's something to
do with overcalculating or getting repeated counts somehow--some way in
which I've framed how our looping through the poems and/or the rdg elements
works.

It may be my strategy is too complex, so maybe let's see if we can figure
out a simpler way to do this!
(In working out this strategy, I'm patching together some advice ( See
among others:
http://stackoverflow.com/questions/6679705/count-the-number-of-occurrences-of-a-string-in-xml-using-xslt
) that used to do things like this with xsl:call-template in XSLT
1.0...and from reading Michael Kay (p. 271), I don't think we need to use
that any more and xsl:analyze-string ought to be able to do much the
same thing. So this is my attempt to do much the same thing...)


Reply to this email directly or view it on GitHub
#28 (comment)
.

@ebeshero
Copy link
Collaborator

I'm finally looking back at this after a busy day! A quick correction for everyone: We should be calling these dashes (not hyphens). We're using a dash character. In your write-ups on Dickinson for the site, you really don't want to call these hyphens!

@ebeshero
Copy link
Collaborator

@djbpitt Okay, a follow-up question: When I write this code:

<xsl:template match="rdg[contains(., '&#8212;')]">
        <xsl:analyze-string select="." regex="&#8212;">
            <xsl:matching-substring >
             <xsl:text>X</xsl:text>   
            </xsl:matching-substring>

        </xsl:analyze-string>
    </xsl:template>

should it not reach into the <rdg> element, analyze the string there for the dash character, and THEN, in <xsl:matching-substring> output a letter X every time it encounters the dash (identified with the @regex in <xsl:analyze-string>? I suspect with you that this is not happening and instead I'm outputting X's for the entire string of characteers in the rdg element, which means analyze-string isn't doing what I think it should be doing, but I don't know why not. I suppose my dash character isn't a literal regex but an actual unicode character, and I wondered if it wouldn't work because of that, but...I think it ought to isolate those dashes unless I'm misunderstanding how xsl:matching-substring works(?)

@ebeshero
Copy link
Collaborator

@djbpitt @nlottig94 @brookestewart @amielnicki Thanks to David's advice I worked out an alternative solution, which I've commented on in this commit: 0e1de33

If you can follow what I've started here, I think you'll be in good shape to continue with calculating your percentages. Beware of my testOutput SVG, which looks pretty but isn't correct! It's just outputting raw counts multiplied by a $y-interval, and this has nothing to do with your y-axis as you've plotted it.

@djbpitt
Copy link

djbpitt commented Dec 16, 2015

GIven this XML:

blah—blah—blah——blah

and this XLST:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs" version="2.0">
<xsl:template match="/">

xsl:apply-templates/

/xsl:template
<xsl:template match="p">
<xsl:analyze-string select="." regex="—">
xsl:matching-substring
xsl:textX/xsl:text
/xsl:matching-substring
/xsl:analyze-string
/xsl:template
/xsl:stylesheet

I get this result:

XXXX

which is what I expect: one 'X' for each em-dash. Is that not what you get?

On Tue, Dec 15, 2015 at 6:54 PM, Elisa Beshero-Bondar <
notifications@github.com> wrote:

@djbpitt https://github.com/djbpitt Okay, a follow-up question: When I
write this code:

<xsl:template match="rdg[contains(., '—')]">
<xsl:analyze-string select="." regex="—">
<xsl:matching-substring >
xsl:textX/xsl:text
/xsl:matching-substring

    </xsl:analyze-string>
</xsl:template>

should it not reach into the element, analyze the string there for
the dash character, and THEN, in xsl:matching-substring output a letter
X every time it encounters the dash (identified with the @regex in
xsl:analyze-string? I suspect with you that this is not happening and
instead I'm outputting X's for the entire string of characteers in the rdg
element, which means analyze-string isn't doing what I think it should be
doing, but I don't know why not. I suppose my dash character isn't a
literal regex but an actual unicode character, and I wondered if it
wouldn't work because of that, but...I think it ought to isolate those
dashes unless I'm misunderstanding how xsl:matching-substring works(?)


Reply to this email directly or view it on GitHub
#28 (comment)
.

@ebeshero
Copy link
Collaborator

@djbpitt That's what I'd expect to get, and why I thought my strategy with xsl:analyze-string would work. But instead it counted way too many characters--in the 400s and 600s, not the 20s or 30s we expect. My hunch is it had to do with my positioning of the variable definition within XSL:for-each, but that is set to look at each of the 11 files one at a time. I wonder if reaching into a new template was looking through the poem plus the whole collection somehow?

I am glad that my strategy works in principle, but the practice of deploying it over a collection from within for-each seems to multiply the dashes! I wish I knew why--but do you think it's to do with the collection()? In other words, what if you had a collection of 11 files like this?

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p>blah—blah—blah——blah</p>
</root>

The XSLT I tried was something like this (pared down without SVG stuff):

   <xsl:variable name="DickinsonColl" select="collection('Dickinson')"/>
  <xsl:template match="/">
 <xsl:for-each select="$DickinsonColl//TEI">
               <xsl:variable name="dashX">
                     <xsl:apply-templates select=".//rdg"/>
                  </xsl:variable>  
            <xsl:text>Dash Count! </xsl:text>
<xsl:value-of select="string-length($dashX)"/>

    </xsl:for-each>
     </xsl:template>

<xsl:template match="rdg[contains(., '&#8212;')]">
    <xsl:analyze-string select="." regex="&#8212;">
          <xsl:matching-substring >
          <xsl:text>X</xsl:text>   
         </xsl:matching-substring>       
   </xsl:analyze-string>
   </xsl:template>

Why do I generate such massive string-lengths with this? I don't think it would be giving me the length of the rdg elements which actually don't contain hundreds of characters...they are quite short--part of a Dickinson line. I think the problem must be with the template match on rdg, so my new solution doesn't involve a new template.

@ebeshero
Copy link
Collaborator

@djbpitt @nlottig94 @brookestewart @amielnicki See my latest commit of the XSLT file.
I just figured out how to get <xsl:analyze-string> to work, and it outputs the same (good) results) as the first way that worked yesterday. I want to show you Dickinsonians both approaches so you can decide which one you want to keep using. I imagine that you'll need to keep making calculations to study alterations of various kinds or dash removal, etc, or simply to think about differences in string-length() within the published versions.

If you understand how <xsl:analyze-string> works (read about it in Michael Kay), you might want to try using it to isolate other kinds of punctuation marks coded in your elements to help you compare the kinds of punctuation alterations you're seeing the various editors make. Some notes:

  1. You don't need its @regex attribute to be a literal regular expression, but you can set it on an actual character (period, comma, etc.)

  2. Inside <xsl:analyze-string>I've been using <xsl:matching-substring> to isolate one character and process it in some way. Think about your strategy for studying the published editions and how they compare to this. You may want to experiment more with the following:

  • string-length(): Consider how one string-length() in a compares with the string-length of Dickinson's original. Complication: you'll want to include Dickinson's variants (inside their own <rdg> elements) to get the full picture of how much is cut or added.
  • Within <xsl:analyze-string> try using <xsl:non-matching-substring> to process everything that does NOT match your @regex (including getting its string-length, among other things).

Hope this helps!

@nlottig94
Copy link
Owner Author

THE SVG IS FINISHED!!!! However, @amielnicki I can't figure out where to paste the svg in the dash html...So the svg is not in a folder. It is named Dickinson_SVG.svg. Let me know what to do @amielnicki

@ebeshero
Copy link
Collaborator

I think Alex told us how he expected it to come in when we were on the phone earlier but I'm sorry I don't recall the details. I think he just said to paste it on the page. What happens when you do that--is it visible?

For guidance on positioning SVG on the HTML page, see our JavaScript Exercise 3 and look in the third bullet point under "Some guidance for stepping your way to a solution."

@ebeshero
Copy link
Collaborator

And really, this should help: https://css-tricks.com/scale-svg/

@alexthattalks
Copy link
Collaborator

It is live on the site!! http://dickinson16.newtfire.org/dash.html

@nlottig94
Copy link
Owner Author

It looks awesome @ebeshero !!!!!!! CHECK IT OUT EVERYONE!!! @blawrence719 @brookestewart @ghbondar And thank you for your input @djbpitt !

@ebeshero
Copy link
Collaborator

@amielnicki @nlottig94 @blawrence719 @brookestewart Wow! That was speedy work, everyone, and what a fascinating graph! I see Brooke Lawrence's intro, about, and conclusion pages are up now, too, and I like what you've done to flesh out the story of our repurposing from Michele Ierardi's old student site. Here are some questions for the team, since I know you're still working on this for another day, maybe two if you need it:

  1. Will you prepare a bibliography of the print and ms editions you worked with? You may want it on the SVG page to explain the symbols in your colorful legend.

  2. You should discuss the findings of your dash analysis, and it might make sense to do so on the same page with the graph. What other kinds of analysis might this lead to, if you were going to continue to study how Dickinson's punctuation was altered by later editions?

  3. How about a link (multiple links?) to your project GitHub, to highlight your fancy code work? You might discuss Alex's XSLT for the poem interface, Nicole's and Alex's use of image maps, your approach to calculating the percentage of reduction in dashes in the print editions...I bet a lot of this is coming in your Methodology page.

  4. Somewhere in the Home or About page is a mention of the old Assumptions page, which you'll want to update to the new page.

Really brilliant work so far, everyone!

@brookestewart
Copy link
Collaborator

@ebeshero I have the dash analysis finished except for the bibliography. I could remove some of the bibliographical info from the paragraph that explains the abbreviations and leave just the full title once the bibliography is finished, and just have viewers reference that for more info.
Should I do it the way it is here: http://www.cs.virginia.edu/~ajf2j/emily/stab.html or MLA format? Does it matter?

@ebeshero
Copy link
Collaborator

@brookestewart Follow the style for citations that I posted in our 19c Brit Lit Annotation Research Assignment: http://newtfire.org/19cBrit/AnnotResearchAssign.html. MLA style makes sense. You may want to add comments to each entry of the bibliography to provide a little information about each work. Read more about these editions here: https://www.emilydickinsonmuseum.org/posthumous_publication

@blawrence719 linked to that page on the home page she developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants