Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: delete a slide #67

Open
scanny opened this issue Dec 10, 2013 · 35 comments
Open

feature: delete a slide #67

scanny opened this issue Dec 10, 2013 · 35 comments
Labels
Milestone

Comments

@scanny
Copy link
Owner

scanny commented Dec 10, 2013

In order to modify an existing presentation to suit a new purpose
As a developer using python-pptx
I need the ability to delete a slide

API perhaps:

slides.remove(slide)

# OR

slides.remove(slide_a, slide_b, ...)
# alternately
slides.remove(*slides[2:4])

# OR

slide.delete()
@scanny scanny added the slide label Jun 15, 2014
@scanny scanny modified the milestones: v0.4.0, later Nov 16, 2014
@children1987
Copy link

It is very usefull! Expeting for this.

@new-guy
Copy link

new-guy commented Oct 6, 2015

I need this functionality for a project I'm currently working on. Any suggestions on where to get started for implementing it on my own?

@scanny
Copy link
Owner Author

scanny commented Oct 8, 2015

I believe if you remove the slide reference from the presentation element and remove the relationship connecting the slide to the presentation, the slide will get dropped on save. Basically you undo what the _Slides.add_slide() function does here: https://github.com/scanny/python-pptx/blob/master/pptx/parts/presentation.py#L121

@new-guy
Copy link

new-guy commented Oct 8, 2015

Cool! I'll give that a go

@jdgodchaux
Copy link

@waveofbabies Did you have any luck with what @scanny proposed? Would you be willing to share what worked or didn't work? Thanks!

@jdgodchaux
Copy link

@scanny What you proposed to @waveofbabies sounds like it would be pretty straightforward, but I'm not sure where to get started. Would you be willing to write out some example code? Thanks!

@jdgodchaux
Copy link

I created the following in the _Slides() class. While looping though a presentation's slides, I passed in one at a time a defined range of slide numbers I wanted to remove:

def remove_slide(self, idx):
    rId = self._sldIdLst[idx].rId
    self._prs.drop_rel(rId)

This ended up removing the content of these slides, but left the slides themselves behind. So close!

In #68 @blaze33 described a technique they used for removing slides:

def delete_slide(self, presentation,  index):
    xml_slides = presentation.slides._sldIdLst  # pylint: disable=W0212
    slides = list(xml_slides)
    xml_slides.remove(slides[index])

I tried passing in slide numbers I wanted to remove using a loop like the above, but I was only successful in removing one slide -- the first slide. The rest of the slides I want to remove are stubbornly still present. Any help would be much appreciated!

@scanny
Copy link
Owner Author

scanny commented Dec 18, 2015

@jdgodchaux I think your question might be a good one for Stack Overflow.

@krzys-andrei
Copy link

thank you @jdgodchaux !
this successfully remove slides #22 to #118 from the pptx file :

for i in range(118,22,-1) : 
   rId = prs.slides._sldIdLst[i].rId
   prs.part.drop_rel(rId)
   del prs.slides._sldIdLst[i]

@jdgodchaux
Copy link

@krzys-andrei Great! Glad I could be of assistance!

@EBjerrum
Copy link

EBjerrum commented Apr 21, 2017

Thank you for the info as I also needed to delete some slides. based on @krzys-andrei work I made this function that deletes a slide object from a prs object.

def delete_slide(prs, slide):
    #Make dictionary with necessary information
    id_dict = { slide.id: [i, slide.rId] for i,slide in enumerate(prs.slides._sldIdLst) }
    slide_id = slide.slide_id
    prs.part.drop_rel(id_dict[slide_id][1])
    del prs.slides._sldIdLst[id_dict[slide_id][0]]

@Krossfire9
Copy link

(I'm newish to coding) I can't seem to get this function to actually delete anything. I nested it inside a for loop like so:
for n in range(5,1,-1):
delete_slide(prs,prs.slides[n])
where prs is the presentation. It compiles without error but makes no changes to the prs. Can anyone help spot what I did wrong?

@EBjerrum
Copy link

EBjerrum commented Aug 7, 2017

Its been a while since I used the function, which I maybe didn't test with multiple deletes. I would be worried that you are updating the prs.slides each time you do a delete_slide. Do prs.slides[n] then really correspond to the slide you want in loop number 3?

How about

for slide in prs.slides[1:5]:
    delete_slide(prs,slide)

@Krossfire9
Copy link

@EBjerrum Thanks for responding! I made a couple rookie mistakes (saved the doc before deleting slides) but the eventual code that worked was:

for i in range(0,6,1):
        delete_slides(prs, 0)

def delete_slides(presentation, index):
        xml_slides = presentation.slides._sldIdLst  
        slides = list(xml_slides)
        xml_slides.remove(slides[index])       

I was having trouble because every time I tried to delete the first slide (slide 0) it would reindex. I finally used that to my advantage deleting slide 0 six times. Not the most elegant solution, but it works

@will133
Copy link

will133 commented Aug 25, 2017

If the Slides object has python list semantics, isn't it easier to implement the delete as:

del prs.slides[:]

?

@scanny
Copy link
Owner Author

scanny commented Aug 25, 2017

@will133 I looked into implementing removal of an object from a collection like that a while back. Can't remember the object off the top of my head, but it was like this, a member of a collection. I don't remember the details without repeating the research, but I came to the clear conclusion, as I recall, that using the del statement was going to be a bad idea for something like this.

The del statement is really designed for removing names from the namespace and it's surprisingly complex to implement it in a robust way, things like reference counts and so on making it a crap shoot on whether you ever receive a __del__() call and where.

But I do like using a slice for elegant specification; the method could use a variable-length argument list to allow specifying a single slide, a number of individual slides, or a list (sequence) of slides to be deleted. Something like:

def delete_slides(*slides):
    ...

so you can call it like:

delete_slides(slide_x)

or

delete_slides(slide_x, slide_y)

or

delete_slides(*prs.slides[1:3])

@maribet
Copy link

maribet commented Apr 19, 2018

I deleted some slides using the code example suggested here. But now I get an error when opening the resulting presentation in PowerPoint 2016. I can click 'Repair' and then everything looks fine but of course I would prefere the presentation to open up without an error. The presentation works fine before the slides are deleted. Any idea what goes wrong and how to fix it?

    def __delete_slide(self, index):
        presentation = self.prs
        xml_slides = presentation.slides._sldIdLst
        slides = list(xml_slides)
        xml_slides.remove(slides[index])

@jbq
Copy link

jbq commented Apr 19, 2018

@maribet same for me, I can't delete slides right now

@maribet
Copy link

maribet commented Apr 19, 2018

The python code works but it seems to somehow corrupt the pptx file. Is this an Office 2016 issue? Did it work with earlier Office versions?

@scanny
Copy link
Owner Author

scanny commented Apr 19, 2018

I would expect you have some dangling relationship(s) still referring to the old slide. Doing this job cleanly in the general case involves coming to understand the full graph of objects the slide is embedded in and taking appropriate care to remove all the required links.

That complexity is one reason it hasn't been implemented in the API yet.

@maribet
Copy link

maribet commented Apr 26, 2018

I solved my problem by hiding the slides. This way they are also not exportet to pdf, so the solution works for me.
But a working DELETE for slides would be great!

@scanny
Copy link
Owner Author

scanny commented Oct 3, 2018

A few notes on this for possible use later

The basic work of deleting a slide is to remove the relationship to that slide from the presentation part and remove its reference from the slide list (sldIdLst).

I don't believe there are any other "inbound" relationships referring to a slide that need to be dealt with. A Notes Slide is related to its slide, but the only way to get to it is from the slide itself (the relationship is "two-way".

All the other relationships I can think of can simply be ignored and they would disappear when the presentation is saved. A slide's relationship to its slide-layout is an example of that; when there is no relationship to the slide, the slide doesn't get written. When there's no slide, its relationships are not traversed and they are not written. The only thing that would be problematic is a relationship target that needed to be explicitly deleted lest it appear in the saved presentation somehow.

Images, charts, and hyperlinks are the three other common relationships. I'm inclined to think they would all be self-resolving. The best next step is probably to experiment a little and see what the behavior is when just deleting the slide from the presentation part and its relationships. It could be deleting a slide is substantially easier than deleting a shape (in the general case).

@loscil06
Copy link

loscil06 commented Jan 4, 2019

Thanks for the hard work! Please let it happen, it'd be extremely useful if it gets implemented.

@bersbersbers
Copy link

@Krossfire9:

I was having trouble because every time I tried to delete the first slide (slide 0) it would reindex. I finally used that to my advantage deleting slide 0 six times. Not the most elegant solution, but it works

Old one, but you could simple iterate backwards so reindexing does not change any indices ;)

@bersbersbers
Copy link

I did experiment a bit further and found that while deleting slides works using a couple of different proposals from this issue, they are not complete (enough). In the example below, any of the methods creates a corrupt .pptx from scratch. Note that I start from an empty presentation, create two slides, remove the first and add a third. For both methods 1 and 2, I get this error:

/usr/lib64/python3.6/zipfile.py:1355: UserWarning: Duplicate name: 'ppt/slides/slide2.xml'
  return self._open_to_write(zinfo, force_zip64=force_zip64)
/usr/lib64/python3.6/zipfile.py:1355: UserWarning: Duplicate name: 'ppt/slides/_rels/slide2.xml.rels'
  return self._open_to_write(zinfo, force_zip64=force_zip64)

This is the code:

import pptx


def remove_slide(prs, idx):
    # https://github.com/scanny/python-pptx/issues/67#issuecomment-165708190
    rId = prs.slides._sldIdLst[idx].rId
    prs.part.drop_rel(rId)


def delete_slide(presentation, index):
    # https://github.com/scanny/python-pptx/issues/67#issuecomment-165708190
    # https://github.com/scanny/python-pptx/issues/67#issuecomment-320792864
    # https://github.com/scanny/python-pptx/issues/67#issuecomment-382660749
    xml_slides = presentation.slides._sldIdLst  # pylint: disable=W0212
    slides = list(xml_slides)
    xml_slides.remove(slides[index])


def delete_slide_2(prs, slide):
    # https://github.com/scanny/python-pptx/issues/67#issuecomment-296135015
    id_dict = {slide.id: [i, slide.rId] for i, slide in enumerate(prs.slides._sldIdLst)}
    slide_id = slide.slide_id
    prs.part.drop_rel(id_dict[slide_id][1])
    del prs.slides._sldIdLst[id_dict[slide_id][0]]


method = 2

prs = pptx.Presentation()
slide0 = prs.slides.add_slide(prs.slide_layouts[0])
slide1 = prs.slides.add_slide(prs.slide_layouts[1])

if method == 0:
    remove_slide(prs, 0)
elif method == 1:
    delete_slide(prs, 0)
elif method == 2:
    delete_slide_2(prs, slide0)

slide2 = prs.slides.add_slide(prs.slide_layouts[2])
prs.save('bug.pptx')

@lokesh1729
Copy link

Thank you for the info as I also needed to delete some slides. based on @krzys-andrei work I made this function that deletes a slide object from a prs object.

def delete_slide(prs, slide):
    #Make dictionary with necessary information
    id_dict = { slide.id: [i, slide.rId] for i,slide in enumerate(prs.slides._sldIdLst) }
    slide_id = slide.slide_id
    prs.part.drop_rel(id_dict[slide_id][1])
    del prs.slides._sldIdLst[id_dict[slide_id][0]]

this is working without giving repair error than the other one

thank you very much....

@natter1
Copy link

natter1 commented Jan 17, 2020

@bersbersbers is right, the method from @krzys-andrei has still a problem when adding a slide after deleting. In his example using delete_slide_2, the new created slide has

part.partname=='/ppt/slides/slide2.xml'

which seems correct, as it is the second slide. But since the deleting did not change the partname of the remaining slides, this partname is already used by the now first slide in prs. So I guess a deleting method has to somehow change the partnames as well (or move all remaining slides following the deleted slide up, if this feature becomes available).

@natter1
Copy link

natter1 commented Jan 17, 2020

or maybe the Problem is with add_slide:

@property
def _next_slide_partname(self):
    """
    Return |PackURI| instance containing the partname for a slide to be
    appended to this slide collection, e.g. ``/ppt/slides/slide9.xml``
    for a slide collection containing 8 slides.
    """
    sldIdLst = self._element.get_or_add_sldIdLst()
    partname_str = "/ppt/slides/slide%d.xml" % (len(sldIdLst) + 1)
    return PackURI(partname_str)

Does PowerPoint expect the partname (partname_str) to be similar to the slide index?

@scanny
Copy link
Owner Author

scanny commented Jan 17, 2020

The slide parts in a presentation can be renamed using PresentationPart.rename_slide_parts() here: https://github.com/scanny/python-pptx/blob/master/pptx/parts/presentation.py#L99

This is called the first time the prs.slides attribute is accessed for a given presentation, such that whatever they were named on disk, they now have consecutive and normalized names. https://github.com/scanny/python-pptx/blob/master/pptx/presentation.py#L111

You can call this whenever you want with:

prs.part.rename_slide_parts()

You could also just save and reload the deck after one or more slide-delete operations before trying to add a new one. No need to save to disk of course, just saving to BytesIO and then loading back in from that should get it done (and would be pretty quick).

Note that part names are arbitrary. They can't be allowed to collide, but other than that, neither the naming "template" or the "directory" location is prescribed by the PowerPoint spec. The reason for naming them consistent with their order in the deck is just to make the Zip structure more readable by humans.

I think @natter1 has correctly identified the proximate problem, that the PresentationPart._next_slide_partname() method is naive about finding partnames and does not assure uniqueness. I think the "production" fix would just be to call .rename_slide_parts() after each delete as that avoids any vagaries of what they might have been named before. Then the naive approach continues to work just fine and is efficient.

@Merthew
Copy link

Merthew commented Oct 20, 2020

Is this still in development then? That's too bad this feature would be awesome to have.

@MartinPacker
Copy link

Although it's not the same, I wonder if "Hide A Slide" would be easier.

@joelw
Copy link

joelw commented May 26, 2021

It's a little off topic, but in case it's useful for anyone here's a function I created based on some of the previous solutions that will split up a PPTX into one file per presentation. It works for me and doesn't cause any 'file requires repair' errors on PowerPoint.

Creating a separate instance of Presentation for each file was necessary because the slide deletion is destructive. I tried to use copy.deep_copy but this didn't seem to work.

p = Presentation(file)
for slide in range(1, len(p.slides)):
    p_temp = Presentation(file)
    split_pptx(p_temp, slide, f"split_presentation-{slide}.pptx")

def split_pptx(presentation, slide_number, write_file):
    for i in range(len(presentation.slides._sldIdLst), 0, -1):
        print(i)
        if (i != slide_number):
            rId = presentation.slides._sldIdLst[i-1].rId
            presentation.part.drop_rel(rId)
            del presentation.slides._sldIdLst[i-1]
    presentation.part.rename_slide_parts(list(presentation.part.rels))
    presentation.save(write_file)

@himanshu-yaduvanshi
Copy link

@scanny - I am using the below code to delete the 1st and 2nd slide out of the ppt of 5 slides:

slide_list_to_dlt = [1,2]
for sldNum in slide_list_to_dlt:
    rId = ppt.slides._sldIdLst[sldNum].rId
    ppt.part.drop_rel(rId)
    del ppt.slides._sldIdLst[sldNum]  

but this code is deleting the 1st and 3rd slide of the ppt.

Can you explain to me why it is happening?

Note: I took the reference from here only.
image
you can check this in the comments above.

@scanny
Copy link
Owner Author

scanny commented Dec 10, 2021

I expect it is because once you've deleted the first slide, what used to be the third slide is now the second slide.

An easy remedy is to delete in reverse order:

for sldNum in [2, 1]:
    ... delete slide ...

@evdelen
Copy link

evdelen commented Mar 29, 2022

I found this How To on deleting a slide from Microsoft, it documents additional steps that are not addressed in any of the methods described above: https://docs.microsoft.com/en-us/office/open-xml/how-to-delete-a-slide-from-a-presentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests