Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power chip failures on v1.1 #92

Open
dtcallcock opened this issue Jun 16, 2021 · 11 comments
Open

Power chip failures on v1.1 #92

dtcallcock opened this issue Jun 16, 2021 · 11 comments

Comments

@dtcallcock
Copy link
Member

Looks like IC6 pretty much exploded.

IMG_4490

Crate has been running for months, and was working fine at end of day on Friday. Found in this state on Mon morning so it seems like it was spontaneous rather than user-inflicted. Was powered using XP PSU it shipped with (via backplane adaptor barrel connector). That was plugged into a surge protector strip. Kasli was located in a 19" subrack with forced air cooling. Connected were 2xUrukuls, 1x DIO_SMA, 1x Sampler, 1x DIO_RJ45.

Any ideas on why it might have done this?

@pathfinder49
Copy link

I've also seen an exploding SMPS on Kasli 1.1 👀

@gkasprow
Copy link
Member

Such explosions happen when the input voltage is exceeded. Usually, all rails get fried.
The chip has thermal protection so overheating should not cause such issues.
The black residue around L16 and P3V3 test point look suspicious. It looks like some fluid was flowing and causing the short circuit.
When chips explode, they usually don't leave such traces.

@dtcallcock
Copy link
Member Author

dtcallcock commented Jun 16, 2021

I've also seen an exploding SMPS on Kasli 1.1

Such explosions happen when the input voltage is exceeded.

Is there an issue with the quality of PSUs then? To be clear, I'm using an XP Power AKM65US12. I just checked it and it puts out 12.38V unloaded and 11.95V into a 10R resistive load (14W).

The black residue around L16 and P3V3 test point look suspicious. It looks like some fluid was flowing and causing the short circuit.

It does, but 'to the right' on this photo would have been 'up' when mounted in rack so I'm not sure fluid would flow like this. Also, no traces of fluid on the fan tray or other electronics that sit above it in rack.

@dtcallcock
Copy link
Member Author

Obviously v1.1 is somewhat historical but AFAICT this part of the board is very similar on V2.

Such explosions happen when the input voltage is exceeded.

For reference, the ADP5052 Vin is specced as 4.5-15V with an abs. max of 18V, so there is quite a bit of safety margin.

I'm using an XP Power AKM65US12

Btw, I note that the schematic specifies Mean Well GSM90B12-P1M. That PSU has overvoltage protection that would shut it down at <135%=16.2V (ie. below abs. max). The XP doesn't so perhaps that's a clue. Any idea which PSUs are generally out there in the wild?

image

@sbourdeauducq
Copy link
Member

We've shipped plenty of systems with the AKM65US12, and have not seen this issue.

@dtcallcock
Copy link
Member Author

We've shipped plenty of systems with the AKM65US12

Are all systems shipped with AKM65US12? Anyone know why the schematic is not followed (or what even motivated the schematic choice in first place)?

@marmeladapk
Copy link
Member

marmeladapk commented Jun 17, 2021 via email

@dtcallcock
Copy link
Member Author

The replacement board blew up in exactly the same way. The vapourized silicon and metal even exited the chip package in the same location! Again it spontaneously did this when nobody was in the lab after months of being happy. When we switched this board in we changed the PSU for the Mean Well one with over voltage protection discussed above. This probably rules out a rogue PSU and the overvoltage theory.

IMG_20230109_124957589_HDR
IMG_5797

@dtcallcock dtcallcock changed the title Power chip failure on v1.1 Power chip failures on v1.1 Jan 9, 2023
@gkasprow
Copy link
Member

gkasprow commented Jan 9, 2023

It looks like the mid-layer 1 where 12V is routed, was really hot. As well as the GND return path on top layer.
Your power supply must be delivering a lot of current.
It looks like the dc/dc converter shorted the 12V rail.
Moreover, it looks like the 3V3 converter was affected first, probably killing all 3V3 circuits like SFPs.

@dtcallcock
Copy link
Member Author

dtcallcock commented Jan 9, 2023

Your power supply must be delivering a lot of current.

It's a 6.67A PSU.

Moreover, it looks like the 3V3 converter was affected first, probably killing all 3V3 circuits like SFPs.

Any chance a bad SFP caused this? I believe we reused the SFP off the first fried board. It's hard to imagine it fried the board without frying itself though!

Do you want these fried boards for post-mortem? I guess it's not interesting unless\until you start seeing v2 boards fail in the field (which it sounds like isn't happening even though this part of the board is basically the same).

@dtcallcock
Copy link
Member Author

dtcallcock commented Jan 9, 2023

Is it worth thinking about putting a fuse on Kasli? Especially given how many random things pull power via the EEMs. Presumably wouldn't have prevented this failure but at least there would be a better chance of tracking down the issue (and even repairing the board - they aren't cheap!) rather than leaving a smoking wreckage behind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants